CA
is the standard name for the α-carbon. Example uses of atom names in Rosetta/pyrosetta include setting constraints, using a params file for a custom ligand and so forth. However, RDKit is a bit of a nuisance with atom names as it is not a central feature, but a feature added for PDB files that is not too well documented.File formats
This problem stems from the fact that few formats handle atom names. For small ligands, a few file formats exist as evident from the differentChem.MolFromXXXX
functions. A SMILES string contains no atom information (or its coordinates), bar the element and the index, which is the order in which the atoms appears. A mol files and a sdf file do not contain atom names and differ primarily in that the latter can contain multiple molecules (different or conformers) and extra data. A pdb file contains a lot more information, such as an atom index, element and name, residue name and index. A mol2 files contains similar information, plus partial charges. Technically, Autodock's pdbqt format is a pdb with partial charges, but is not a standard file format.
RDKit
In RDKit, an atom (i.e.rdkit.Chem.rdchem.Atom
, retrieved say with mol.GetAtomWithIdx(19)
) contains extra data for read/writing to PDB files. This extra data is not in a "Prop" property (e.g. found via atom.GetPropsAsDict
), but in an rdkit.Chem.rdchem.AtomPDBResidueInfo
object retrieved via atom.GetPDBResidueInfo
and setable via atom.SetMonomerInfo
.
One property of the info is atom.GetPDBResidueInfo().GetName()
, which is that in the PDB.If an atom is from a molecules not from a PDB file/block
atom.GetPDBResidueInfo
will return None
and when written to PDB will result in a HETATM
line with atom name element symbol + Nth (starting from 1, not includes atoms with AtomPDBResidueInfo
data, and present even when no other atoms of that element are present), residue name UNL
and residue index 1
.AtomPDBResidueInfo
The constructor for AtomPDBResidueInfo accepts a few arguments:__init__(_object*,
std::__1::basic_string<char,
std::__1::char_traits<char>,
std::__1::allocator<char> > atomName,
int serialNumber=1,
std::__1::basic_string<char,
std::__1::char_traits<char>,
std::__1::allocator<char> > altLoc='',
std::__1::basic_string<char,
std::__1::char_traits<char>,
std::__1::allocator<char> > residueName='',
int residueNumber=0,
std::__1::basic_string<char,
std::__1::char_traits<char>,
std::__1::allocator<char> > chainId='',
std::__1::basic_string<char,
std::__1::char_traits<char>,
std::__1::allocator<char> > insertionCode='',
double occupancy=1.0,
double tempFactor=0.0,
bool isHeteroAtom=False,
unsigned int secondaryStructure=0,
unsigned int segmentNumber=0)
This is very C++ish. So in Python this would be:__init__(self,
atomName:str,
serialNumber:int=1,
altLoc:str='',
residueName:str='',
residueNumber:int=0,
chainId:str='',
insertionCode:str='',
occupancy:float=1.0,
tempFactor:float=0.0,
isHeteroAtom:bool=False,
secondaryStructure:int=0,
segmentNumber:int=0)
Note that saving a molecule without a PDB residue info, has slightly different defaults for four attributes:Argument | Type | Default | To PDB |
---|---|---|---|
atomName |
string | Blank | element+Nth |
residueName |
string | Blank | UNK |
residueNumber |
integer | 0 |
1 |
isHeteroAtom |
boolean | False |
True |
Caveats
- When an operation on a molecule returns a new molecule, such as substitutions, the PDB data is not preserved, so the information needs to be added manually.
- The spacing of the output is not corrected and the spacing on a PDB file is important on paper —although some programs do not care and split by spaces and have consequently trouble with CONECT lines between 4 digit atom indices. Therefore it is likely important to pad the strings: the atom name to be 4 chars long, the residue name 3 for ATOM and 4 for HETATM.
As a result of this, it may be easier sometimes to tinker with the files directly as I have done in a previous post about using RDKit with Rosetta or in a GitHub repo to rename atom names of a molecule based on a similar one with names (i.e. to trick Rosetta, e.g. making a GNP to a GTP).
Edit: Footnote
To save atom names in a mol or sdf a hack is to save them in molFileAlias property:
for a in mol.GetAtoms():
name = a.GetPDBResidueInfo().GetName()
a.SetProp('molFileAlias', name)
When saved as a mol or sdf these will appear as a block below the bond definitions (e.g. for an atom index zero named CB: A 1\nCB
)
No comments:
Post a Comment