There is no single magic
mol.CorrectBondOrder()
command in Rdkit, but luckily there are some tricks that can be done. Here I will discuss finding out using the number of bound atoms.TL;DR: This post deals with the case you do **not** have or cannot get a trustworthy SMILES string for use with the RDKit's
AssignBondOrdersFromTemplate
function.Since writing this I learnt that this can be done in PyBabel with
OBMol.PerceiveBondOrders()
and recently there is a function for guessing bond order in RDKit, so this post is not longer relevant"Cheating" is an easy solution, namely consulting a reference SMILES and using the command
AllChem.AssignBondOrdersFromTemplate(..)
(see this helpful post by Pat Walters). However, sometimes that is not an option and doing it from scratch is the way one is forced down. Okay, I will admit that there are not many such cases, but one in a while you may have PDB files or similar where you do not know the SMILES. An example for me was in a check I did to see the pre-made params file in Rosetta https://github.com/matteoferla/Display-of-preset-Rosetta-NCAAs.There are two options. One is to rely on the number of bonded atoms (assuming hydrogens are present) and the other based on angles. Both have their issues, namely the former requires all hydrogens to be explicitly declared and charges may be problematic, while the latter has issues with angles that deviate from typical values. On importing a PDB to a
Chem.Mol
object, the argument removeHs=False
stops hydrogens getting stripped (e.g. mol = Chem.MolFromPDBFile('V02.pdb', removeHs=False)
). Here is a function I wrote that does the former (fix_bond_order(mol))
:
from rdkit import Chem
from rdkit.Chem import AllChem, rdFMCS, rdDepictor, rdMolTransforms
from typing import List
def fix_bond_order(mol: Chem.Mol) -> Chem.Mol:
"""On a Mol where hydrogens are present it guesses bond order."""
def is_sp2(atom: Chem.Atom) -> bool:
N_neigh = len(atom.GetBonds())
symbol = atom.GetSymbol()
if symbol == 'H':
return False
elif symbol == 'N' and N_neigh < 3:
return True
elif symbol == 'C' and N_neigh < 4:
return True
elif symbol == 'O' and N_neigh < 2:
return True
else:
return False
def get_other(bond: Chem.Bond, atom: Chem.Atom) -> Chem.Atom:
"""Given an bond and an atom return the other."""
if bond.GetEndAtomIdx() == atom.GetIdx(): # atom == itself gives false.
return bond.GetBeginAtom()
else:
return bond.GetEndAtom()
def find_sp2_bonders(atom: Chem.Atom) -> List[Chem.Atom]:
return [neigh for neigh in find_bonders(atom) if is_sp2(neigh)]
def find_bonders(atom: Chem.Atom) -> List[Chem.Atom]:
return atom.GetNeighbours()
def descr(atom: Chem.Atom) -> str:
return f'{atom.GetSymbol()}{atom.GetIdx()}'
## main body of function
for atom in mol.GetAtoms():
#print(atom.GetSymbol(), is_sp2(atom), find_sp2_bonders(atom))
if is_sp2(atom):
doubles = find_sp2_bonders(atom)
if len(doubles) == 1:
#tobedoubled.append([atom.GetIdx(), doubles[0].GetIdx()])
b = mol.GetBondBetweenAtoms( atom.GetIdx(), doubles[0].GetIdx())
if b:
b.SetBondType(Chem.rdchem.BondType.DOUBLE)
else:
raise ValueError('Issue with:', descr(atom), descr(doubles[0]))
elif len(doubles) > 1:
for d in doubles:
b = mol.GetBondBetweenAtoms( atom.GetIdx(), d.GetIdx())
if b:
b.SetBondType(Chem.rdchem.BondType.AROMATIC)
b.SetIsAromatic(True)
else:
raise ValueError('Issue with:', descr(atom), descr(d))
elif len(doubles) == 0:
print(descr(atom),' is underbonded!')
else:
pass
#print(descr(atom),' is single', find_bonders(atom))
return mol
There are a few things to note here.- It is actually working in place. To copy a molecule in Rdkit one has to simply do <code>Chem.Mol(mol)</code>.
- One nested function within
fix_bond_order
isfind_bonders
. It may be unnecessary now, but might be useful in future changes. This is totally redundant withatom.GetNeighbours()
, but there are several case where one would like to tweak this —atom.GetNeighbours()
is the same as[get_other(bond, atom) for bond in atom.GetBonds()]
, but in the latter I could add some filtering etc. - A curious case surface, namely
bond.GetBeginAtom() == bond.GetBeginAtom()
is False, hence why in Rdkit all comparisons should be index based. - The key nested function is
is_sp2
, which does the element based check to see if it should be double bonded. The name is terrible, I know. - I resisted the urge to write it as a class or more specifically turn all the nested functions into classmethods, making it just an object holding functions in a very JavaScript Zen manner. But none of the nested functions are going to be called so there was no point.
is_sp2
function is my checking the angles. Double bonds or aromatic bonds results in a planar trigonal geometry with an angle of 120° as opposed to 109° and tetrahedral.
def is_sp2(atom):
neigh = atom.GetNeighbors() # or the function describe a second ago!
if len(neigh) < 2:
return True # or is it!
else:
conf = conf = mol.GetConformer(0)
d = rdMolTransforms.GetAngleDeg(conf,neigh[0].GetIdx(),atom.GetIdx(), neigh[1].GetIdx())
rd = round(d/10)
if rd == 11:
return False
elif rd == 12:
return True
else: ## what is this???
return False
However, this approach has lots of pitfalls, such as angles that deviate from nearly 110 or 120. And the really awful line where the atom with only one neighbour is assumed to be double bonded — if the other atom is not it won't be (using the previous code, it would just trigger print(descr(atom),' is underbonded!')
line).
Great blog post. I've been looking at ways to determine bond orders+formal charges from coordinates+elements, and how reliable the guesses are. I haven't benchmarked them, but some other implementations I've found are:
ReplyDeletexyz2mol: https://github.com/jensengroup/xyz2mol/
A recent MDAnalysis addition: https://cedric.bouysset.net/blog/2020/07/22/rdkit-converter-part2
AmberTools' bondtype program: https://github.com/choderalab/openmoltools/blob/96292f7b603f9f0fcb8ccae06a0adf5a8d7483fd/openmoltools/forcefield_generators.py#L84-L146
OpenEye's OEPercieveBondOrders (requires a license): https://docs.eyesopen.com/toolkits/python/oechemtk/OEChemFunctions/OEPerceiveBondOrders.html