Love thy neighbours, but select them with caution

Monday, 8 June 2020

Love thy neighbours, but select them with caution

In Rosetta NeighborhoodResidueSelector behaves differently than PyMOL's expand selector and it is good to be aware of it. Namely, in PyMOL the distance is from any atom, while in Rosetta it is from the center of mass atom, unless specified differently. In reality CloseContactResidueSelector works like PyMOL's expand selector.

Rosetta selectors

First it ought to be said that Rosetta selectors are different from NGL or PyMOL's. The latter use a string with a selection algebra, while the former is an object that selects residues in a pose with the command selector.apply(pose) and returns a boolean vector of length len(pose.residues), so is a lot less user-friendly.

PyMOL

Given a ligand residue of 3-letter name LIG in your protein, the 3Å neighbours are selected in pymol with byres (resn LIG around 3), for example:
with pymol2.PyMOL() as pymol:
    pymol.cmd.fetch('xxx') #or .load / .read_pdbstr etc.
    for atom in pymol.cmd.get_model('byres (resn LIG around 3)').atom:
        print(f'{atom.resi} {atom.resn} {atom.name}')
Where around excludes the focus residue and expand includes it. In pyrosetta the equivalent code is lengthier.

Fetch

First, the equivalent of fetch is:
pose = pyrosetta.toolbox.pose_from_rcsb(code)
However, importing a PDB code with a non-canonical amino acid or ligand is not a single command, because the non-standard residue needs the correct topology file (params). In fact during import of pyrosetta it is handy to switch off the PDB component library with the argument load_PDB_components false as the atom names may differ between structures and everything spaghettifies. To load a pose with a params file:
import pyrosetta
pyrosetta.init(extra_options='-no_optH false -mute all -ignore_unrecognized_res true -load_PDB_components false')
    
pose = pyrosetta.rosetta.core.pose.Pose()
params_paths = pyrosetta.rosetta.utility.vector1_string()
params_paths.extend(['xxx.params'])
resiset = pyrosetta.generate_nonstandard_residue_set(pose, params_paths)
pyrosetta.rosetta.core.import_pose.pose_from_rcsb(pose, 'xxx') # pose_from_pdbstring or pose_from_file
Shameless advert If you want to alter a params file or convert an rdkit Chem.Mol to a params file, check out my python package for exactly that: pip rdkit-to-params and github rdkit_to_params.

resn/resi

To do the resn LIG we need a first selector:
resn = pyrosetta.rosetta.core.select.residue_selector.ResidueNameSelector()
resn_selector.set_residue_name3('LIG')
Alternatively the equivalent of resi is:
resi = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector()
resi_selector.set_index(23)
Where the index is in pose numbering and not PDB numering, so most often you'd do:
pdb2pose = pose.pdb_info().pdb2pose
r = pdb2pose(res=41, chain='A')
As a consequence, when we do a print to check what is selected thusly (which returns the residues that are True/1:

sele_vector = selector.apply(self.pose))
print(pyrosetta.rosetta.core.select.residue_selector.ResidueVector(sele_vector))
We would probably want to add a list comprehension with pose.pdb_info().pose2pdb.

around/expand

NeighborhoodResidueSelector = pyrosetta.rosetta.core.select.residue_selector.NeighborhoodResidueSelector
ns = NeighborhoodResidueSelector(lig_selector, distance=7, include_focus_in_subset=False)
Clearly, include_focus_in_subset is equivalent to the around/expand pair. However, some key detail is hidden in distance...

NeighborhoodResidueSelector

The distance is from the center of mass atom, not from every atom in the molecule. In the topology file, there are two lines NBR_ATOM and NBR_RADIUS. When making the topology file it is recommended that NBR_ATOM be at the middle (centroid). NeighborhoodResidueSelector can overide the identity of this atom. Consequently the distance if off by a certain amount (NBR_RADIUS). If we want to compensate for this we can simply sum the distance of 3 Å with this value:
lig_vector = lig_selector.apply(pose)
lig_i = pyrosetta.rosetta.core.select.residue_selector.ResidueVector(lig_vector).pop()
lig = self.pose.residues[lig_i]
lig_size = lig.nbr_radius()
I say simply because getting from the residue types feels more arcane:
resiset = pose.residue_type_set_for_pose()
v = pyrosetta.rosetta.core.chemical.ResidueTypeFinder(resiset).get_all_possible_residue_types()
lig_size = [vv for vv in v if vv.name3() == 'LIG'][0].nbr_radius()

NumNeighborsSelector

This selector seems awesome at first as it allows the selection of N amount of neighbours. It accepts two arguments, the integer threshold (number of neighbours) and the float distance_cutoff. But actually the selector input is not exposed. So is not meant to be used outside of C++.

CloseContactResidueSelector

However, you can achieve the same result as expand using CloseContactResidueSelector.
resi_sele = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector(1)
cc_sele = pyrosetta.rosetta.core.select.residue_selector.CloseContactResidueSelector()
cc_sele.central_residue_group_selector(resi_sele)
cc_sele.threshold(3)
The threshold has to be set as the default is 2.320352260215883e+77 Å. Obviously, there is nothing forbidding you from using the pymol selection algebra and passing it to pyrosetta. This would mean that you'd get more accurate selections for something long and narrow. In fact, the fact that pymol/rdkit/pyrosetta overlap in some areas make them stronger together. Hence here is a concluding useful snippet that converts a pose to a pdbblock string that can be read by pymol.cmd.read_pdbstr(xxx) or rdkit.Chem.MolFromPDBBlock(xxx).
buffer = pyrosetta.rosetta.std.stringbuf()
pose.dump_pdb(pyrosetta.rosetta.std.ostream(buffer))
pdbblock = buffer.str()

No comments:

Post a Comment