Guess bond order in Rdkit by number of bound atoms

Wednesday, 12 February 2020

Guess bond order in Rdkit by number of bound atoms

Some compChem/Biochem programs do not care about bond order and strip them, which is rather frustrating. Ligands in random PDB files without any name, smiles are a classic example.
There is no single magic mol.CorrectBondOrder() command in Rdkit, but luckily there are some tricks that can be done. Here I will discuss finding out using the number of bound atoms.

TL;DR: This post deals with the case you do **not** have or cannot get a trustworthy SMILES string for use with the RDKit's AssignBondOrdersFromTemplate function.
"Cheating" is an easy solution, namely consulting a reference SMILES and using the command AllChem.AssignBondOrdersFromTemplate(..) (see this helpful post by Pat Walters). However, sometimes that is not an option and doing it from scratch is the way one is forced down. Okay, I will admit that there are not many such cases, but one in a while you may have PDB files or similar where you do not know the SMILES. An example for me was in a check I did to see the pre-made params file in Rosetta
There are two options. One is to rely on the number of bonded atoms (assuming hydrogens are present) and the other based on angles. Both have their issues, namely the former requires all hydrogens to be explicitly declared and charges may be problematic, while the latter has issues with angles that deviate from typical values. On importing a PDB to a Chem.Mol object, the argument removeHs=False stops hydrogens getting stripped (e.g. mol = Chem.MolFromPDBFile('V02.pdb', removeHs=False)). Here is a function I wrote that does the former (fix_bond_order(mol)):

from rdkit import Chem
from rdkit.Chem import AllChem, rdFMCS, rdDepictor, rdMolTransforms
from typing import List

def fix_bond_order(mol: Chem.Mol) -> Chem.Mol:
    """On a Mol where hydrogens are present it guesses bond order."""
    def is_sp2(atom: Chem.Atom) -> bool:
        N_neigh = len(atom.GetBonds())
        symbol = atom.GetSymbol()
        if symbol == 'H':
            return False
        elif symbol == 'N' and N_neigh < 3:
            return True
        elif symbol == 'C' and N_neigh < 4:
            return True
        elif symbol == 'O' and N_neigh < 2:
            return True
            return False

    def get_other(bond: Chem.Bond, atom: Chem.Atom) -> Chem.Atom:
        """Given an bond and an atom return the other."""
        if bond.GetEndAtomIdx() == atom.GetIdx(): # atom == itself gives false.
            return bond.GetBeginAtom()
            return bond.GetEndAtom()
    def find_sp2_bonders(atom: Chem.Atom) -> List[Chem.Atom]:
        return [neigh for neigh in find_bonders(atom) if is_sp2(neigh)]

    def find_bonders(atom: Chem.Atom) -> List[Chem.Atom]:
        return atom.GetNeighbours()

    def descr(atom: Chem.Atom) -> str:
        return f'{atom.GetSymbol()}{atom.GetIdx()}'

    ## main body of function
    for atom in mol.GetAtoms():
        #print(atom.GetSymbol(), is_sp2(atom), find_sp2_bonders(atom))
        if is_sp2(atom):
            doubles = find_sp2_bonders(atom)
            if len(doubles) == 1:
                #tobedoubled.append([atom.GetIdx(), doubles[0].GetIdx()])
                b = mol.GetBondBetweenAtoms( atom.GetIdx(), doubles[0].GetIdx())
                if b:
                    raise ValueError('Issue with:', descr(atom), descr(doubles[0]))
            elif len(doubles) > 1:
                for d in doubles:
                    b = mol.GetBondBetweenAtoms( atom.GetIdx(), d.GetIdx())
                if b:
                    raise ValueError('Issue with:', descr(atom), descr(d))
            elif len(doubles) == 0:
                print(descr(atom),' is underbonded!')
            #print(descr(atom),' is single', find_bonders(atom))
    return mol

There are a few things to note here.
  • It is actually working in place. To copy a molecule in Rdkit one has to simply do <code>Chem.Mol(mol)</code>.
  • One nested function within fix_bond_order is find_bonders. It may be unnecessary now, but might be useful in future changes. This is totally redundant with atom.GetNeighbours(), but there are several case where one would like to tweak this —atom.GetNeighbours() is the same as [get_other(bond, atom) for bond in atom.GetBonds()], but in the latter I could add some filtering etc.
  • A curious case surface, namely bond.GetBeginAtom() == bond.GetBeginAtom() is False, hence why in Rdkit all comparisons should be index based.
  • The key nested function is is_sp2, which does the element based check to see if it should be double bonded. The name is terrible, I know.
  • I resisted the urge to write it as a class or more specifically turn all the nested functions into classmethods, making it just an object holding functions in a very JavaScript Zen manner. But none of the nested functions are going to be called so there was no point.
The other way requires a slightly different version of the is_sp2 function is my checking the angles. Double bonds or aromatic bonds results in a planar trigonal geometry with an angle of 120° as opposed to 109° and tetrahedral.

def is_sp2(atom):
    neigh = atom.GetNeighbors() # or the function describe a second ago!
    if len(neigh) < 2:
        return True # or is it!
        conf = conf = mol.GetConformer(0)
        d = rdMolTransforms.GetAngleDeg(conf,neigh[0].GetIdx(),atom.GetIdx(), neigh[1].GetIdx())
        rd = round(d/10)
        if rd == 11:
            return False
        elif rd == 12:
            return True
        else: ## what is this???
            return False
However, this approach has lots of pitfalls, such as angles that deviate from nearly 110 or 120. And the really awful line where the atom with only one neighbour is assumed to be double bonded — if the other atom is not it won't be (using the previous code, it would just trigger print(descr(atom),' is underbonded!') line).

No comments:

Post a comment