Recently I calculated the local RMSD caused by each residue and I thought I'd share the methods I used using PyRosetta —it is nothing at all novel, but I could not find a suitable implementation. The task is simple given two poses, find out what residue's backbone is changing the most by scanning along comparing each a short peptide window from each.
Premise
Model
original = pyrosetta.toolbox.rcsb.pose_from_rcsb('1L2Y')
mini = pyrosetta.rosetta.protocols.grafting.return_region(original, 1,20)
nmr = pyrosetta.rosetta.utility.vector1_core_pose_Pose()
n = [pyrosetta.rosetta.protocols.grafting.return_region(original, 1+i,20+i) for i in range(0, original.total_residue(),20)]
nmr.extend(n)
RMSD
RMSD (root mean square deviation) is a metric in Ångström of how much on average do the atoms positions differ between two conformations (wiki). If the comparison is between a conformation with an imaginary centroid of an ensemble it is called RMSF (fluctuation). This is a very simple metric and gives the most weight to atoms that change the most and does not take into account several things such functional groups etc. for which there are many "shapes-and-colours" metrics. But RMSD is simple and clear.
In pyrosetta.rosetta.core.scoring
there are several functions that calculate different forms. They do a superposition beforehand, but it is always wise to check.
One can check manually or simply do a test. So lets copy the first pose and translate it:
import numpy as np
x, y, z = (10, 0, 0)
rototrans = np.array([[1, 0, 0, x], [0, 1, 0, y],[0, 0, 1, z],[0, 0, 0, 1]])
copy = nmr[1].clone()
copy.apply_transform(rototrans)
pyrosetta.rosetta.core.scoring.CA_rmsd(nmr[1], copy, 1, 3),\
pyrosetta.rosetta.core.scoring.CA_rmsd(nmr[1], copy),\
pyrosetta.rosetta.core.scoring.bb_rmsd_including_O(nmr[1], copy),\
pyrosetta.rosetta.core.scoring.all_atom_rmsd(nmr[1], copy),\
pyrosetta.rosetta.core.scoring.all_atom_rmsd_nosuper(nmr[1], copy)
As hoped all bar the last give near zero values —if the rototranslation part seems gibberish, you may want to read up on 4x4 rototranslational matrices: the learning curve is very steep, but worth it.
Simple
With CA_rmsd
one can specify a window and if the poses have the same residue, the code is very simple.
def calculate_simple(ref:pyrosetta.Pose, target: pyrosetta.Pose, window: int= 3):
assert (window - 1) % 2 == 0, 'Odd number windows only'
half_window = int((window - 1)/2)
residue_rsmds = [float('nan')] * half_window
for off_i in range(half_window, ref.total_residue() - half_window):
i = off_i + 1
residue_rsmds.append(pyrosetta.rosetta.core.scoring.CA_rmsd(ref, target, i-half_window, i+half_window))
residue_rsmds.extend([float('nan')] * half_window)
return residue_rsmds
Here I am comparing to the first model because whereas my control is an ensemble, I am interested in comparing conformational switches between two or more poses. If I did care about an ensemble, I could use an averaged pose —generated with ProDy say to get the RMSF for each atom or residue.
Generally, the window size used is 5. But one could use a larger one.
Real world
This all nice, howerver, my conformers have different residues due to missing density and I would like all the backbone. So the code becomes instantly more complex, because the PDB residues need to be compared and one has to make sure only the a given chain is compared. As a result slicing the peptide out of the pose based on PDB numbering has to be done.
from typing import *
def slice_for_comparison(pose:pyrosetta.Pose,
from_res: Tuple[int, str],
to_res: Tuple[int, str]) -> Union[None, pyrosetta.Pose]:
"""
Returns None is there is missing density in the range or different chains
"""
# convert to pose
from_resi, from_chain = from_res
to_resi, to_chain = to_res
pdb2pose = pose.pdb_info().pdb2pose
from_r = pdb2pose(res=from_resi, chain=from_chain)
to_r = pdb2pose(res=to_resi, chain=to_chain)
# validity checks.
if from_chain != to_chain: # different chain. user's doing...
return None
elif from_r==0 or to_r == 0: # missing density
return None
elif to_r - from_r != to_resi - from_resi: # gap
return None
else: # slice
return pyrosetta.Pose(pose, from_r, to_r)
def calculate_window(ref:pyrosetta.Pose,
target: pyrosetta.Pose,
from_res: Tuple[int, str],
to_res: Tuple[int, str],
rmsd_fx: Callable= pyrosetta.rosetta.core.scoring.CA_rmsd
):
sliced_ref = slice_for_comparison(ref, from_res, to_res)
sliced_target = slice_for_comparison(target, from_res, to_res)
# from_resi, to_resi are valid only from CA_rmsd, hence the slicing
if sliced_ref is None or sliced_target is None:
return float('nan')
else:
return rmsd_fx(sliced_ref, sliced_target)
def calculate(ref:pyrosetta.Pose,
target: pyrosetta.Pose,
chain: str='A',
window: int= 3,
rmsd_fx: Callable= pyrosetta.rosetta.core.scoring.CA_rmsd):
assert (window - 1) % 2 == 0, 'Odd number windows only'
half_window = int((window - 1)/2)
# get the boundaries of the pose residue indices making that chain.
ref_pdb = ref.pdb_info().number
chain_id = pyrosetta.rosetta.core.pose.get_chain_id_from_chain(chain, ref)
from_resi = ref_pdb(ref.chain_begin(chain_id))
to_resi = ref_pdb(ref.chain_end(chain_id))
# calculate
residue_rsmds = {i: calculate_window(ref,
target,
(i-half_window, chain),
(i+half_window, chain),
rmsd_fx) for i in range(from_resi+half_window, to_resi-half_window+1)}
return residue_rsmds
But despite the complexity caused by the slicing per PDB numbering, the method is nice and simple really...
NGL
As an added bonus, one can use NGL to show multiple poses, as seen in a previous blog post.
import nglview as nv
from io import StringIO
import time
class ModNGLWidget(nv.NGLWidget):
def add_rosetta(self, pose: pyrosetta.Pose):
buffer = pyrosetta.rosetta.std.stringbuf()
pose.dump_pdb(pyrosetta.rosetta.std.ostream(buffer))
fh = StringIO(buffer.str())
view.add_component(fh, ext='pdb')
view = ModNGLWidget()
view.add_rosetta(nmr[1])
view.add_rosetta(nmr[21])
view.component_0.update_cartoon(color='turquoise')
view.component_1.update_cartoon(color='salmon')
view.component_0.add_hyperball(selection='15-18', colorValue='turquoise')
view.component_1.add_hyperball(selection='15-18', colorValue='salmon')
view
No comments:
Post a Comment