PLIP is a handy tool to enumerate the interactions of a given ligand. However, a few of tripping point I keep having is related to the fact the interactions are namedtuples. Here are some notes to circumvent the traps.
No SMILES and no connectivity?
One thing to note is that PLIP does not accept a SMILES. It runs OBMol.PerceiveBondOrder
to determine bond order (OEDetermineConnectivity
and OEPerceiveBondOrders
are the OpenEye equivalent) prior to that for the connectivity if absent.
This is far from ideal for compounds that are far from ideal in the first place. As a result generating a protonated structure with correct bond order by repeated CONECT lines is preferable in my opinion (example script).
To tell PLIP you have protons already, use:
from plip.basic import config
config.NOHYDRO = True
EDIT. Something funky goes on behind the scenes, so it might be best to remove the hydrogens and accept any errors from incorrectly percieved bond orders.
from rdkit import Chem
from rdkit.Chem import AllChem
from openbabel import pybel
from plip.basic import config
from plip.structure.preparation import PDBComplex
config.NOHYDRO = True
mol = AllChem.AddHs(Chem.MolFromSmiles('c1ccccc1'))
AllChem.EmbedMolecule(mol)
block = Chem.MolToPDBBlock(mol) # rubbish CONECT is flavor=8, but it's still okay
pb_mol = pybel.readstring('pdb', block, {'s': None})
print([a.type for a in pb_mol.atoms])
pb_mol.OBMol.PerceiveBondOrders()
print([a.type for a in pb_mol.atoms])
p = PDBComplex()
p.load_pdb(block, as_string=True)
print('this is wrong!')
print([a.type for a in p.ligands[0].mol.atoms])
These are not the interactions you are looking for
It is easy to trip up with namedtuples. The collections.namedtuple
is a class factory, a function that returns a class. This returned class is a subclass of tuple
(a class), not of collections.namedtuple
, which is a function, so would be impossible. It is far from the labrythm of madness of Boost types, but I do trip up on this.
from collections import namedtuple
from types import FunctionType
Foo = namedtuple('Foo', ['a', 'b', 'c'])
foo = Foo(1,2,3)
assert isinstance(foo, Foo)
assert isinstance(namedtuple, FunctionType)
assert isinstance(Foo, type) # The type of a class is type
print(Foo.__mro__) # ('__main__.Foo','tuple','object')
assert isinstance(tuple, type)
The name given to namedtuple becomes __name__
, which is not used for isinstance checking.
Foo = namedtuple('Foo', ['a', 'b', 'c'])
FauxFoo = namedtuple('Foo', ['a', 'b', 'c'])
isinstance(Foo(1,2,3), FauxFoo), Foo.__name__, FauxFoo.__name__ # (False, 'Foo', 'Foo')
This is obvious, but is a problem as the interaction classes are not exposed, cf. code.
For the sake of sanity here are the relevant lines.
from collections import namedtuple
hydroph_interaction = namedtuple('hydroph_interaction', 'bsatom bsatom_orig_idx ligatom ligatom_orig_idx '
'distance restype resnr reschain restype_l, resnr_l, reschain_l')
hbond = namedtuple('hbond', 'a a_orig_idx d d_orig_idx h distance_ah distance_ad angle type protisdon resnr '
'restype reschain resnr_l restype_l reschain_l sidechain atype dtype')
pistack = namedtuple(
'pistack',
'proteinring ligandring distance angle offset type restype resnr reschain restype_l resnr_l reschain_l')
pication = namedtuple(
'pication', 'ring charge distance offset type restype resnr reschain restype_l resnr_l reschain_l protcharged')
saltbridge = namedtuple(
'saltbridge', 'positive negative distance protispos resnr restype reschain resnr_l restype_l reschain_l')
halogenbond = namedtuple('halogenbond', 'acc acc_orig_idx don don_orig_idx distance don_angle acc_angle restype '
'resnr reschain restype_l resnr_l reschain_l donortype acctype sidechain')
waterbridge = namedtuple('waterbridge', 'a a_orig_idx atype d d_orig_idx dtype h water water_orig_idx distance_aw '
'distance_dw d_angle w_angle type resnr restype reschain resnr_l restype_l reschain_l protisdon')
metal_complex = namedtuple('metal_complex', 'metal metal_orig_idx metal_type target target_orig_idx target_type '
'coordination_num distance resnr restype '
'reschain restype_l reschain_l resnr_l location rms, geometry num_partners complexnum')
itxn_names = ('hydroph_interaction', 'hbond', 'pistack', 'pication', 'saltbridge', 'halogenbond', 'waterbridge', 'metal_complex')
itxn_classes = (hydroph_interaction, hbond, pistack, pication, saltbridge, halogenbond, waterbridge, metal_complex)
However, I cannot check if something is an instance of any of these as the classes are redeclared on each call (i.e. the class of one Interaction of one call of plip.structure.preparation.PDBComplex.analyze
will be different from another, but will be identical within). There are three options as a result: (a) override the builtin isinstance
to check by name (a cosmic no-no) (b) rummage through the garbage collector (a big no-no), (c) subclass the namedtuple-generated class and give it a custom metaclass with a custom __instancecheck__
dunder method that compares the __name__
not the memory address (so faffy), or (d) don't touch isinstance
and have a function that does the check, without summoning Cthulhu in the process:
def isintxn(instance, cls_or_name: Union[type, str]):
"""
Is ``interaction`` an instance of a class with the same name as ``cls_or_name``?
"""
if isinstance(cls_or_name, str):
name: str = cls_or_name
else:
name: str = cls_or_name.__name__
return instance.__class__.__name__ == name
Is it dist or distance or distance_l?
Each interaction seems to have slightly different fields. Here are the fields:
import pandas as pd
key_tally = pd.DataFrame([{'name': cls.__name__, **{f: True for f in cls._fields}} for cls in itxn_names]).fillna(False).set_index('name')
prefered_order = key_tally.sum().sort_values(ascending=False).index.to_list()
key_tally = key_tally[prefered_order]
key_tally.transpose()
name | hydroph_interaction | hbond | pistack | pication | saltbridge | halogenbond | waterbridge | metal_complex |
---|---|---|---|---|---|---|---|---|
restype | True | True | True | True | True | True | True | True |
resnr | True | True | True | True | True | True | True | True |
reschain | True | True | True | True | True | True | True | True |
restype_l | True | True | True | True | True | True | True | True |
resnr_l | True | True | True | True | True | True | True | True |
reschain_l | True | True | True | True | True | True | True | True |
distance | True | False | True | True | True | True | False | True |
type | False | True | True | True | False | False | True | False |
h | False | True | False | False | False | False | True | False |
offset | False | False | True | True | False | False | False | False |
dtype | False | True | False | False | False | False | True | False |
atype | False | True | False | False | False | False | True | False |
sidechain | False | True | False | False | False | True | False | False |
protisdon | False | True | False | False | False | False | True | False |
angle | False | True | True | False | False | False | False | False |
d_orig_idx | False | True | False | False | False | False | True | False |
d | False | True | False | False | False | False | True | False |
a_orig_idx | False | True | False | False | False | False | True | False |
a | False | True | False | False | False | False | True | False |
distance_aw | False | False | False | False | False | False | True | False |
distance_dw | False | False | False | False | False | False | True | False |
metal_type | False | False | False | False | False | False | False | True |
d_angle | False | False | False | False | False | False | True | False |
water_orig_idx | False | False | False | False | False | False | True | False |
w_angle | False | False | False | False | False | False | True | False |
metal | False | False | False | False | False | False | False | True |
water | False | False | False | False | False | False | True | False |
metal_orig_idx | False | False | False | False | False | False | False | True |
bsatom | True | False | False | False | False | False | False | False |
target | False | False | False | False | False | False | False | True |
target_orig_idx | False | False | False | False | False | False | False | True |
target_type | False | False | False | False | False | False | False | True |
coordination_num | False | False | False | False | False | False | False | True |
donortype | False | False | False | False | False | True | False | False |
location | False | False | False | False | False | False | False | True |
rms | False | False | False | False | False | False | False | True |
geometry | False | False | False | False | False | False | False | True |
num_partners | False | False | False | False | False | False | False | True |
acctype | False | False | False | False | False | True | False | False |
protcharged | False | False | False | True | False | False | False | False |
acc_angle | False | False | False | False | False | True | False | False |
charge | False | False | False | True | False | False | False | False |
ligatom | True | False | False | False | False | False | False | False |
ligatom_orig_idx | True | False | False | False | False | False | False | False |
distance_ah | False | True | False | False | False | False | False | False |
distance_ad | False | True | False | False | False | False | False | False |
proteinring | False | False | True | False | False | False | False | False |
ligandring | False | False | True | False | False | False | False | False |
ring | False | False | False | True | False | False | False | False |
bsatom_orig_idx | True | False | False | False | False | False | False | False |
don_angle | False | False | False | False | False | True | False | False |
positive | False | False | False | False | True | False | False | False |
negative | False | False | False | False | True | False | False | False |
protispos | False | False | False | False | True | False | False | False |
acc | False | False | False | False | False | True | False | False |
acc_orig_idx | False | False | False | False | False | True | False | False |
don | False | False | False | False | False | True | False | False |
don_orig_idx | False | False | False | False | False | True | False | False |
complexnum | False | False | False | False | False | False | False | True |
Get atom name
Atom names are a problem throughout compchem (cf. past post) and bar for saving atom names in the molFileAlias
atom property in a mol file they are easily lost. PLIP does not care for them, but they can be rescued (code from my PLIPspots repo):
from typing import Union
from openbabel.pybel import Atom, Residue
from openbabel.pybel import ob
def get_atomname(atom: Union[Atom, ob.OBAtom]) -> str:
"""
Given an atom, return its name.
"""
if isinstance(atom, Atom):
res: ob.OBResidue = atom.residue.OBResidue
obatom = atom.OBAtom
elif isinstance(atom, ob.OBAtom):
obatom: ob.OBAtom = atom
res: ob.OBResidue = obatom.GetResidue()
else:
raise TypeError
return res.GetAtomID(obatom)
def get_atom_by_atomname(residue: Union[ob.OBResidue, Residue], atomname: str) -> ob.OBAtom:
"""
Get an atom by its name in a residue.
"""
if isinstance(residue, Residue):
residue = residue.OBResidue
obatom: ob.OBAtom
for obatom in ob.OBResidueAtomIter(residue):
if residue.GetAtomID(obatom).strip() == atomname:
return obatom
else:
raise ValueError(f'No atom with name {atomname} in residue {residue.GetName()}')
I am not going to into the rabbit hole of Atom
vs. ob.OBAtom]
as it's not a rabbit hole, but an eldritch gate directly to R'lyeh...
No comments:
Post a Comment