Since posting this, I realised one can do it even faster by hijacking the threading algorithm, which albeit not it's intended purpose works fine for fixing a structure without supervision —which the following discussed methods do.
When to do it
Side note: Coot
How to do it: Remodel
Remodel
1 M L PIKAA R
2 A L PIKAA ED
[...]
1 M .
, whereas 3 S EMPTY NC HSR
will change residue 3 to the non-canonical amino acid specified —assuming its topology file has been specified for both full atom (-extra_res_fa
) and centroid (-extra_res_cen
). omitting a residue (and allowing the nearby ones to wiggle to compensate) will delete it, while 0 X L PIKAA N
will add a asparagine (again neighbours permitting). The neighbours note is because any mutation needs the nearby residues to be repacked (without redesigning) (10 W NATAA
) or else it would be impossible to close the cutpoint, especially for gap closing as will become clear in the pyrosetta section of this post.Remodelling loops
0 X L PIKAA N
style syntax and changes the preceding and succeeding residues to 10 S L NATAA
syntax or 10 S L PIKAA S
style syntax.generic_aa G
is a good option as this tells the default rough fitting (centroid mode) to use glycine and not valine as generic residue.Caveats
Pose numbering and single chain
alter chain B, resv+=100
, alter chain B, chain='A'
etc. including shifting the residues after the gap. Also, ligands may crash it. Non-canonical amino acids are fine as proven by my other post about Rosetta and isopeptide bonds.In-frame deletion
How to do it: Pyrosetta
Get your pose
Like for remodel, ligands are not okay and will cause a segmentation fault (if you don't want your main process to die, check out my post about processes and segmentation faults). So do not add any ligands.import pyrosetta
pyrosetta.init(extra_options='-no_optH false -mute all -ignore_unrecognized_res true -load_PDB_components false')
pose = pyrosetta.rosetta.core.pose.Pose()
#params_paths = pyrosetta.rosetta.utility.vector1_string()
#params_paths.extend(['E4S.params']) # No! This is forbidden!
#resiset = pyrosetta.generate_nonstandard_residue_set(pose, params_paths)
pyrosetta.rosetta.core.import_pose.pose_from_file(pose, 'pdb6bq1.ent')
# or use pose_from_pdbstring or pose_from_file etc. or the toolbox module...
Add missing residues
# create the insert with nice PDB numbering
insert = pyrosetta.pose_from_sequence(sequence)
for i in range(1, insert.total_residue() + 1):
insert.pdb_info().chain(i, your_chain)
insert.pdb_info().number(i, i + first_gap_pdb_resi - 1)
# make a fold tree
ft = pyrosetta.FoldTree()
ft.simple_tree(pose.total_residue())
# say i is the pose residue position after the gap
ft.new_jump(i -1,i ,i -1)
assert ft.check_fold_tree(), f'Error in foldtree {ft}'
pose.fold_tree(ft)
# add pose into pose
n = ft.num_jump()
pose.append_pose_by_jump(gap_pose, n)
Instead it is way easier to extend the residues one by one. In PyMOL the
fab
command is fabulous at making new residues and one can control chain, residue and it will be extended from there, whereas pyrosetta.pose_from_sequence
is a bit more limited. So the residues need to be added one at the time, via a slightly convoluted way of making new residues. # get residue
# first, get the relevant set of residue types.
# In the pose import there was residuetype set commented out: resiset = pyrosetta.generate_nonstandard_residue_set(pose, params_paths)
chm = pyrosetta.rosetta.core.chemical.ChemicalManager.get_instance()
resiset = chm.residue_type_set( 'fa_standard' )
res_type = resiset.get_representative_type_name1(single_letter_name) #e.g. A
residue = pyrosetta.rosetta.core.conformation.ResidueFactory.create_residue(res_type)
pose.append_polymer_residue_after_seqpos(residue, previous + 1, True)
resiset.name_map( three_letter_name )
or even more manually:
v = pyrosetta.rosetta.core.chemical.ResidueTypeFinder(resiset).get_all_possible_residue_types()
ligtype = [vv for vv in v if vv.name3() == name][0] # or whatever property!
This is not a terminal
OXT
atom. Consequently remove the terminus type. I personally get too confused trying to remember which is upper and which lower, so why not just clear both terminus patches (no segmentation fault) and not worry which side is which (for the record it's Cprevious ← N:UPPER_CONNECT — CA — C:LOWER_CONNECT → Nnext).
rm_upper = pyrosetta.rosetta.core.conformation.remove_upper_terminus_type_from_conformation_residue
rm_lower = pyrosetta.rosetta.core.conformation.remove_lower_terminus_type_from_conformation_residue
rm_upper(pose.conformation(), i)
rm_lower(pose.conformation(), i)
Define the loop
# close loop using loops
loops = pyrosetta.rosetta.protocols.loops.Loops()
# say previous is the residue taht was preset before the gap.
loop = pyrosetta.rosetta.protocols.loops.Loop(previous - 1,
cutpoint + 2,
cutpoint)
# cutpoint cannot be autochosen, it has to be the last residue added!
#loop.auto_choose_cutpoint(pose)
loops.add_loop(loop)
# print(loops) gives a nice output
Loop modelling
lm = pyrosetta.rosetta.protocols.loop_modeler.LoopModeler()
lm.set_loops(loops)
# these are enabled by default. here for quick changing.
lm.enable_centroid_stage()
lm.enable_fullatom_stage()
lm.enable_build_stage()
lm.apply(pose)
kic_mover = pyrosetta.rosetta.protocols.loops.loop_closure.kinematic_closure.KinematicMover()
kic_mover.set_pivots(18, 19, 20)
FoldTree().split_existing_edge_at_residue(19)
, which does not add a jump), but are explained in detail in the official documentation.Fragment based
A worry with ab initio loops is that they might have torsions that aren't seen in nature despite the forcefield's best intentions (depending on the paper read though). As a consequence, one may opt for fragment based approaches. To do this, A step further could be using an electron density weighted scorefunction —as discussed in this other post— but that is a bit unnecessary as there is no density if the loop is missing... (see Coot note)LoopModeler
mover does have the method setup_kic_with_fragments_config
, but requires a command line option specifying the fragment files (loops:frag_files
), either during initialisation or after with pyrosetta.rosetta.basic.options.set_file_vector_option('loops::frag_files', vector)
where "vector" is a pyrosetta.rosetta.utility.vector1_std_string
. These can be generated and written as described here or a general version found in the examples of Rosetta.
Full house
pose.pdb_info().pose2pdb
Example
chains
with value the chains (e.g. ('A','B') or 'AB')sequence
which requires the PDB to be not offset —but that is a discussion for another time.
I stumbled on this while struggling through how to perform loop remodeling, and I couldn't understand why Rosetta's remodel kept failing on my multi-chain .pdb. I'm very happy I found your explanation about renaming everything as a single chain - saved me a load of time!!
ReplyDelete