Remodel in Pyrosetta

Monday 26 April 2021

Remodel in Pyrosetta


The Rosetta binary Remodel is a great tool as it allows interesting designs to be made. However, it is rather incompatible with Rosetta Scripts and Pyrosetta as it is heavily dependent on command line options for customisation and repeats some of the processes internally. Despite this, it can be cohersed rather effectively to work in Pyrosetta with some convenience and this is how.

Remodel is a tool that modifies a model based on a blueprint file, allowing residues to be added, removed and whole swathes of secondary structure changed. Using it is a bit of an art, but it is very powerful. For smaller tasks, such as designing a better neighbourhood of sidechains for a ligand, FastRelax (with appropriate packers) will work file, but to design radical backbone changes Remodel is still the go to application.

The mover that is called by the binary is, unsurprisingly, pyrosetta.rosetta.protocols.forge.remodel.RemodelMover(). Like other movers, it has the bound method .apply(pose) that accepts a pose to work on. It has three issues that need to be circumvented:

  • command line options
  • writing a blueprint
  • PDB vs. pose numbering

Helping hand

To make life easier, I have written a python class to make blueprint files, Blueprinter. It and a few more things can be found here.

pip3 install pyrosetta-help

In Python3

from pyrosetta_help import Blueprinter

The major advantage of this class is that is helps write the blueprint file, which normally is done by editing a file. Namely, given a pose one can do the following:

blue = Blueprinter.from_pose(pose)
blue[20:22] = 'PIKAA G' # change residues 20, 21 and 22 (inclusive) to Glycine
blue.wobble_span(20,25) # remodel keeping original amino acid
del blue[15:20] # requires preceeding and suceeding residues to be 'wobbled' though.
blue.del_span(15, 20) # same as above, but wobbles the preceeding and suceeding 1 residues
blue[22] = 'PIKAA W'
blue.pick_native(21)
blue.pick_native(23)
blue.mutate(22, 'W') # same as above, but wobbling adjecent residues.
blue.expand_loop_wobble()
blue.set('mutant.blu')
rm = blue.get_remodelmover(dr_cycles=5, max_linear_chainbreak=0.1)
rm.apply(pose)
blue.show_aligned(pose)   # jupyter notebook (requires Biopython)
# if all goes wrong there is `blue.correct_and_relax(pose)`

I should say, that I call the operation of repacking the adjecent amino acids to an indel or mutation with the original ones "wobble", but this is my made up term.

Command line options

A regular remodel mover command is

rm = pyrosetta.rosetta.protocols.forge.remodel.RemodelMover()
# ...
rm.register_options()
rm.apply(pose)

At the start of the apply call, various command line options are read, including the blueprint file, resulting in an instance of a pyrosetta.rosetta.protocols.forge.remodel.RemodelData that guides the design. Unfortunately, whereas it is theoretically possible to make this object, there is no way to supply it to the mover and the apply method is a rather monolithic piece so rewriting with a few PyRosetta calls isn't an option. So one is stuck with command line options. Likewise, all the classes in the documentation are called by apply, so as far as I know none are useful. To set a command line option one calls:

pyrosetta.rosetta.basic.options.set_boolean_option('remodel:design:find_neighbors', value)
pyrosetta.rosetta.basic.options.set_file_option('remodel:blueprint', filename)
pyrosetta.rosetta.basic.options.set_string_option('remodel:generic_aa', value)

Where the function is the specific one from the options module/namespace (docs) for the argument type and the first parameter is the name as seen in the list of Rosetta commands. It's both a getter (no second parameter) and a setter (the second parameter is the actual argument wanting to be passed). Do remember that if a option is set after instantiation, one must call the .register_options() method of the instance.

rm = pyrosetta.rosetta.protocols.forge.remodel.RemodelMover()
pyrosetta.rosetta.basic.options.set_file_option('remodel:blueprint', filename)
rm.register_options()

This is a bit simpler with my Blueprinter class:

blue = Blueprinter.from_pose(pose)
blue.blueprint = 'mutant.blu'
blue.find_neighbors = True
blue.generic_aa = 'G'
blue.quick_and_dirty = True

PDB residue numbering

In Rosetta there are two residue numberings. In the pose, each residue gets a sequential index, starting from 1 and without skipped residues. This is a totally sensible system as it makes working with the various arrays straight forward. RDKit does the same, albeit starting from 0. Additionally, there is the PDB residue info. This reflects missing residues, different chains, segments and —shudder— insertion codes. RDKit and Pyrosetta store this as a separate object, albeit differently (atom level and pose level).

r = pose.pdb_info().pdb2pose(res=18, chain='A')
print(pose.residue(r).name3())

Generally, in PyRosetta, an integer is a pose index and a string is a PDB residue number+chain (\d+\w). In a few inputs, for example constraints, an isolated number is interpreted as a pose index and a number plus letter is a PDB index. This is without spaces, so different from pose.pdb_info().pose2pdb(r) output, which has white spaces for insertion codes and segment ids. Parenthetically, personally I like the NGL viewers selection algebra strings, where 12:A.NZ would be atom NZ of residue 12 of chain A and [LYS]12%C^B:A/0.NZ would be insertion code C in segment B of model 0, so I end up writing functions to interconvert from this glorius shorthand format.

Unfortunately, Remodel works slightly differently and really cannot handle PDB indices. For example, given a pose with the first residue as 10, in the blueprint file, to mutate Q15E, the following, in pose indicing, works:

3 Y .
4 I H PIKAA I
5 Q H PIKAA E
6 W H PIKAA W
7 L .

while in PDB indicing

13A Y .
14A I H PIKAA I
15A Q H PIKAA E
16A W H PIKAA W
17A L .

has the very curious effect of correctly applying Q15E, but also changes pose index 15 to the generic amino acid (valine below).

NLYIEWLKDGGPCSGRPPPSZ
....*........***.....
NLYIQWLKDGGPCVVVPPPSZ

Actually, if the mutated residue did not have a pose number, the ResfileReaderException would be raised when the mutated line is tokenised.

File: /Volumes/MacintoshHD3/benchmark/W.fujii.release/rosetta.Fujii.release/_commits_/main/source/src/core/pack/task/ResfileReader.cc:1577
On line 3, the pose does not have residue with chain=A, PDBnum=8.

This issue plays a part in N-terminus terminal deletions, where even if a pose numbering is provided that ResfileReaderException is raised. However, RemodelMover cannot do terminal deletions anyway (vide infra).

Consequently, blanking the PDB info is required. Confusingly, setting .obsolute(True) on the PDBInfo does not work (yet it is set after Remodel), so a new PDBInfo may need passing:

pdb_mover = pyrosetta.rosetta.protocols.simple_moves.AddPDBInfoMover()
pdb_mover.apply(pose)

As a result, the Blueprinter uses pose numbering and AddPDBInfoMover is assumed to be called on the pose. One can play with pdb numbering thusly (not recommended):

with open('mut.blu', 'w') as w:
    w.write(blue.to_pdb_str(pose))
blue.blueprint = 'mut.blu

Once remodel is done, it sets the obsolete flag of the PDBInfo to True. This means that if one does pose.dump_pdb(filename) the residues with be reset so that pose and PDB number match. However, if there was a third connection (a LINK record) it will have the obsoleted numbering —a bug. To undo pose.pdb_info().obsolete(True), works if the pose was not reset before the remodel. Blueprinter has a wee function to help restore the original numbering.

mutant = original_pose.clone()
pdb_mover = pyrosetta.rosetta.protocols.simple_moves.AddPDBInfoMover()
pdb_mover.apply(mutant)
remodel_mover = blue.get_remodelmover()
remodel_mover.apply(mutant)
blue.copy_pdb_info(original_pose, mutant)

Other observations

  • Ligands and other chains are tolerated (ommitted from blueprint) —there is a RemodelLigandHandler, which I have not used, but I assume it might play a role if the ligand were part of the remodel.
  • Unlike the binary terminal deletions are not applied —a glitch I assume. However, these residues can be deleted with pyrosetta.rosetta.protocols.grafting.delete_region(pose, start, stop) or pose.delete_residue_slow(1)
  • For an indel it makes sense to remodel to themselves the adjecent residues —else how would the gap be closed—, but with mutations this is a more confusing requirement. However, most often it is compulsory.
  • To mutate to a non-canonical amino acid, the blueprint file used to be able to contain the line EMPTY NC XXX where XXX is a three letter code. However, this has recently changed and is no longer usable (see this forum question).
  • NATAA is meant to chose the natural amino acid. However, this results in the generic amino acid to be chosen. PIKAA X with itself is the way forward. For the Blueprinter, the .pick_native(res) command does this.
  • The generic amino acid is valine. This is great for structured elements. However, glycine is a better choice for loops.
  • If remodel failed to converge you get a structure with the generic amino acids and potentially a gap that needs closing (see post about loop fixing). The blueprinter method .correct_and_relax(pose) circumvents it in a pinch, but a correct blueprint is a better solution.
  • Always run a positive control for a remodel (say modelling an indel with a wobbled span, run a few models with the same wobble span)
  • The angles and bond lengths may not be great, so it may be wise to minimise with the cartesian scorefunction.

Namely:

scorefxn = pyrosetta.create_score_function('ref2015_cart')
cycles = 15
relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, cycles)
relax.cartesian(True)
relax.minimize_bond_angles(True)
relax.minimize_bond_lengths(True)
relax.apply(pose)

No comments:

Post a Comment