NeighborhoodResidueSelector
behaves differently than PyMOL's expand
selector and it is good to be aware of it. Namely, in PyMOL the distance is from any atom, while in Rosetta it is from the center of mass atom, unless specified differently. In reality CloseContactResidueSelector
works like PyMOL's expand
selector.Rosetta selectors
First it ought to be said that Rosetta selectors are different from NGL or PyMOL's. The latter use a string with a selection algebra, while the former is an object that selects residues in a pose with the commandselector.apply(pose)
and returns a boolean vector of length len(pose.residues)
, so is a lot less user-friendly.PyMOL
Given a ligand residue of 3-letter name LIG in your protein, the 3Å neighbours are selected in pymol with
byres (resn LIG around 3)
, for example:with pymol2.PyMOL() as pymol:
pymol.cmd.fetch('xxx') #or .load / .read_pdbstr etc.
for atom in pymol.cmd.get_model('byres (resn LIG around 3)').atom:
print(f'{atom.resi} {atom.resn} {atom.name}')
Where around
excludes the focus residue and expand
includes it. In pyrosetta the equivalent code is lengthier.Fetch
First, the equivalent offetch
is:pose = pyrosetta.toolbox.pose_from_rcsb(code)
However, importing a PDB code with a non-canonical amino acid or ligand is not a single command, because the non-standard residue needs the correct topology file (params).
In fact during import of pyrosetta it is handy to switch off the PDB component library with the argument load_PDB_components false
as the atom names may differ between structures and everything spaghettifies. To load a pose with a params file:import pyrosetta
pyrosetta.init(extra_options='-no_optH false -mute all -ignore_unrecognized_res true -load_PDB_components false')
pose = pyrosetta.rosetta.core.pose.Pose()
params_paths = pyrosetta.rosetta.utility.vector1_string()
params_paths.extend(['xxx.params'])
resiset = pyrosetta.generate_nonstandard_residue_set(pose, params_paths)
pyrosetta.rosetta.core.import_pose.pose_from_rcsb(pose, 'xxx') # pose_from_pdbstring or pose_from_file
Shameless advert If you want to alter a params file or convert an rdkit Chem.Mol
to a params file, check out my python package for exactly that:
pip rdkit-to-params and github rdkit_to_params.
resn/resi
To do theresn LIG
we need a first selector:
resn = pyrosetta.rosetta.core.select.residue_selector.ResidueNameSelector()
resn_selector.set_residue_name3('LIG')
Alternatively the equivalent of resi
is:
resi = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector()
resi_selector.set_index(23)
Where the index is in pose numbering and not PDB numering, so most often you'd do:
pdb2pose = pose.pdb_info().pdb2pose
r = pdb2pose(res=41, chain='A')
As a consequence, when we do a print to check what is selected thusly (which returns the residues that are True/1:
sele_vector = selector.apply(self.pose))
print(pyrosetta.rosetta.core.select.residue_selector.ResidueVector(sele_vector))
We would probably want to add a list comprehension with pose.pdb_info().pose2pdb
.
around/expand
NeighborhoodResidueSelector = pyrosetta.rosetta.core.select.residue_selector.NeighborhoodResidueSelector
ns = NeighborhoodResidueSelector(lig_selector, distance=7, include_focus_in_subset=False)
Clearly, include_focus_in_subset
is equivalent to the around/expand
pair.
However, some key detail is hidden in distance
...
NeighborhoodResidueSelector
The distance is from the center of mass atom, not from every atom in the molecule. In the topology file, there are two linesNBR_ATOM
and NBR_RADIUS
. When making the topology file it is recommended that NBR_ATOM
be at the middle (centroid). NeighborhoodResidueSelector
can overide the identity of this atom. Consequently the distance if off by a certain amount (NBR_RADIUS
).
If we want to compensate for this we can simply sum the distance of 3 Å with this value:
lig_vector = lig_selector.apply(pose)
lig_i = pyrosetta.rosetta.core.select.residue_selector.ResidueVector(lig_vector).pop()
lig = self.pose.residues[lig_i]
lig_size = lig.nbr_radius()
I say simply because getting from the residue types feels more arcane:
resiset = pose.residue_type_set_for_pose()
v = pyrosetta.rosetta.core.chemical.ResidueTypeFinder(resiset).get_all_possible_residue_types()
lig_size = [vv for vv in v if vv.name3() == 'LIG'][0].nbr_radius()
NumNeighborsSelector
This selector seems awesome at first as it allows the selection of N amount of neighbours. It accepts two arguments, the integerthreshold
(number of neighbours) and the float distance_cutoff
. But actually the selector input is not exposed. So is not meant to be used outside of C++.
CloseContactResidueSelector
However, you can achieve the same result asexpand
using CloseContactResidueSelector
.
resi_sele = pyrosetta.rosetta.core.select.residue_selector.ResidueIndexSelector(1)
cc_sele = pyrosetta.rosetta.core.select.residue_selector.CloseContactResidueSelector()
cc_sele.central_residue_group_selector(resi_sele)
cc_sele.threshold(3)
The threshold has to be set as the default is 2.320352260215883e+77 Å.
Obviously, there is nothing forbidding you from using the pymol selection algebra and passing it to pyrosetta.
This would mean that you'd get more accurate selections for something long and narrow. In fact, the fact that pymol/rdkit/pyrosetta overlap in some areas make them stronger together. Hence here is a concluding useful snippet that converts a pose to a pdbblock string that can be read by pymol.cmd.read_pdbstr(xxx)
or rdkit.Chem.MolFromPDBBlock(xxx)
.
buffer = pyrosetta.rosetta.std.stringbuf()
pose.dump_pdb(pyrosetta.rosetta.std.ostream(buffer))
pdbblock = buffer.str()
No comments:
Post a Comment