Pages

Tuesday, 21 July 2020

Switching ligand in a PDB with Fragmenstein

For the Covid Moonshot project, one question by Prof. Frank von Delft of Diamond XChem led to a series of events that culminated in Fragmenstein, a module to do fragment mergers when the followup is as faithful to the starting crystal hits as possible. Even if it's intended use is the hit-to-lead process, there is a nice use that make it rather handy for computational biochemistry in general: switching the ligand in a PDB to another in an energy minimised fashion that obeys the original ligand.
In one mode, Fragmenstein accepts multiple fragment hits and a SMILES string, which is placed mapping as many sensible atoms as possible, but works just fine with a single fragment.
My rdkit-to-params Python module can create a params file based on a PDB template which differs in atom count*, but it does not place it and minimise it, so Fragmenstein needs to be used to switch a ligand.
As it's not the intended use, it is a bit convoluted in its prep. Here is an example, switching AR6 with NAD in PDB:5Y2F, which has already been relaxed —potentially using the EM map for scoring as discussed in the previous post!
So let's start by importing everything we need. Note that both rdkit_to_params and fragmenstein modules use the logging module.

# imports and configs
import pyrosetta
pyrosetta.init(extra_options='-no_optH false -mute all -ignore_unrecognized_res true -load_PDB_components false')

import logging
from fragmenstein import Victor, Fragmenstein
Victor.enable_stdout(logging.DEBUG)

from rdkit.Chem import AllChem
from rdkit import Chem
import pymol2
The first item that needs prepping is the reference ligand that needs extracting.

ref = Victor.extract_mol(name= 'AR6',
                     filepath= '5Y2F.relaxed.pdb',
                     smiles= 'Nc1ncnc2n(cnc12)[CH]3O[CH](CO[P]([O-])(=O)O[P]([O-])(=O)OC[CH]4O[CH](O)[CH](O)[CH]4O)[CH](O)[CH]3O',
                     ligand_resn = 'AR6')
I will admit that in some cases there is something not quite right and this step will fail at the bond order assignment step. To fix this doing a roundtrip and stripping the hydrogens and sanitizing helps. Although if the file went through PyMOL, proximity bonding may have occurred for nasty bonds, so do check in PyMOL and if needed click on the atoms of the spurious bond and use the command unbound pk1, pk2 to unbond them. Yes,  starting with a ligand that has some rough treatment in PyMOL edit mode will still result in a pretty decent ligand coming out of Fragmenstein.
deproto = Chem.MolFromPDBFile('A.pdb', proximityBonding=False, removeHs=True)
# Chem.SanitizeMol(deproto) ?
Chem.MolToPDBFile(deproto, 'B.pdb')
The next step is making a blank template.

with pymol2.PyMOL() as pymol:
    pymol.cmd.load('5y2f.relax2.shifted.pdb', 'protein')
    pymol.cmd.remove('not polymer')
    pymol.cmd.save('template.pdb')
Now we can use the main fragmenstein pipeline, Victor.

Victor.fragmenstein_merging_mode = 'none_permissive'
v = Victor(smiles='[NH3+]C(=O)c1ccc[n+](c1)[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OC[C@H]2O[C@H]([C@H](O)[C@@H]2O)n2cnc3c([NH3+])ncnc23)[C@@H](O)[C@H]1O',
 hits= [ref],
 pdb_filename= 'template.pdb',
 long_name = 'NAD',
 ligand_resn = 'NAD',
 covalent_resi='52A', # a random residue is still required for the constaint ref atom.
 ligand_resi= '403B'
 )
And there you have it. Victor will create a folder in Victor.work_path, with the specified long-name, with a few files. To make a PyMOL session of the results use v.make_pse(), alternative if you are Jupyter notebook, nglview can read rosetta poses.
import nglview
view = nglview.show_rosetta(v.igor.pose)
view

Asterisk Footnote

If you just want a params file for a ligand in a PDB file, then this is fine:

from rdkit_to_params import Params
p = Params.from_smiles_w_pdbfile(pdb_file='5y2f.pdb',
                                 smiles='Nc1ncnc2n(cnc12)[CH]3O[CH](CO[P]([O-])(=O)O[P]([O-])(=O)OC[CH]4O[CH](O)[CH](O)[CH]4O)[CH](O)[CH]3O',
                                 name='AR6')
p.dump('denicotinamided.params')
It can have a few differing atoms between the named ligand in the protein structure and the SMILES as Rosetta tolerates a few missing atoms that aren't the first three in the topology file or connection atoms. So the above could be done with this exploit, but it's rather shoddy.

No comments:

Post a Comment