Stay hydrated

Waters can be an integral part of a protein structure, in fact, it is common to find water crystallised tightly in an X-ray structure. These waters can change the calculated Gibbs free energy of a protein and give better experimental results. Explicit waters can be added in Rosetta/Pyrosetta thanks to the SPaDES algorithm described in Lai et al. 2017. Here is a guide to using it in Pyrosetta.

This protocol adds hydrogen bonded waters, both single edge waters ('sew') and double edge waters ('dew') as the protocol calls them. It has two steps first they are placed and then optimised.

First, the SPaDES protocol is called 'hydrate' (pyrosetta.rosetta.protocols.hydrate).

It works with mismatching PDB vs. pose numbering, multiple chains and non-canonical amino acids, which is always nice, and is rather fast, also a bonus.

The mover, Hydrate is supported by various functions and can be run in a few different ways.

Before starting, regardless of the operation running an empty call to the Hydrate mover is required. Not sure why —it must be some setting needing to be set in the pose.

pyrosetta.rosetta.protocols.hydrate.Hydrate().apply(pose)

The arguments of the class are either nothing, a ScoreFunction or (ScoreFunction and 'protein_flexibility', a string with value either 'all', 'none', 'near_water' or 'resfile') —more below.

Hyfile loading

In the binary-input–like usage, a normal resfile specifying which residues to repack and a hyfile specifying which residues to hydrate are supplied. A hyfile contains row with HYD and a residue number to hydrate or ENF and a WAT residue to keep.

The hyfile is read via the read_hyfile, wherein it fills two boolean vectors.

enforced_V = pyrosetta.rosetta.utility.vector1_bool(pose.total_residue())
hydrate_V = pyrosetta.rosetta.utility.vector1_bool(pose.total_residue())
pyrosetta.rosetta.protocols.hydrate.read_hyfile(hyfile='hydrate_test/hydrate.hyfile',
                                                enforced_V=enforced_V,
                                                hydrate_V=hydrate_V)

Now we can make pyrosetta add a water (anchored to the residue):

pyrosetta.rosetta.protocols.hydrate.hydrate_hyfile(pose=pose,
                                                   hydrate_V=hydrate_V,
                                                   resfile='hydrate_test/pack.resfile')

The boolean vectors are the same as residue selection vectors as one gets from selector.apply(pose), so a very obvious alternative would have been to provide an interface selection vector:

intersele = pyrosetta.rosetta.core.select.residue_selector.InterGroupInterfaceByVectorSelector(pyrosetta.rosetta.core.select.residue_selector.ChainSelector('A'),
                                                                                  pyrosetta.rosetta.core.select.residue_selector.ChainSelector('B'))

hydrate_V = intersele.apply(pose)

The hydration does not really require a resfile (does nothing) and setting it to "default" works fine.

In terms of water enforcement, the command pyrosetta.rosetta.protocols.hydrate.enforce_all_waters(pose) does what you'd expect and one cab make a vector of pre-existing waters the normal way:

water_selector = pyrosetta.rosetta.core.select.residue_selector.ResidueNameSelector()
water_selector.set_residue_name3('WAT')
wv = water_selector.apply(pose)

Do note that water residue is WAT and while HOH is converted to WAT during import, others are not, e.g. OH2, OD2, DOD, SOL, TIP, TIP2, TIP3 or TIP4.

Water!

If we inspect the pose with the pymolmover or with nglview we see that a water was indeed added:

import nglview
view = nglview.show_rosetta(pose)
view.add_representation('hyperball', selection='[WAT]')
view

Now when we run Hydrate again, the water will move into position.

pyrosetta.rosetta.protocols.hydrate.Hydrate().apply(pose)

The waters are now in good positions, but the residues nearby were not repacked as no resfile was provided and no scorefunction given to Hydrate. Oddly this latter step is when stuff gets problematic and segmentation faults happen.

Cavities

This all requires one to explicitly set the residues to hydrate. This can be done automatically however by filling all the cavities. Let's see how many get added:

pre = pose.total_residue()
pyrosetta.rosetta.protocols.hydrate.hydrate_cavities(pose)
post = pose.total_residue()
print(f'{post - pre} waters added.')

Stragglers

One oddity is that one ends up with the excess of waters in a line in the distance at cartesian coordinates equivalent to the index times 10,000. This does not harm pyrosetta but makes the PDB unrenderable in PyMOL, so let's get rid of them anyway.

pre = pose.total_residue()
for i in range(pose.total_residue(),0, -1):
    residue = pose.residue(i)
    if residue.name3() == 'WAT' and round(residue.xyz(1)[0]) == i * 10_000:
        # i=197, .xyz(1) =[1970000.000000001,1969999.999999999,1970000.000000004]
        pose.delete_residue_slow(i)
post = pose.total_residue()
print(f'{post - pre} waters removed.')

Scorefunction

The Hydrate class can optionally accept a scorefunction and protein flexibility. The latter controls how the protein is repacked and is a string that can be "not" (default), "all", "near_water" or "resfile".

A regular full atom score function is fine —in fact, when omitted the code seems to use the generic system scorefunction. The scorefunction can also be an electron density weighted one (for more see past post about ED scoring).

For best results, one probably should use the SPaDES weights. To use the weights from the test folder one can do:

weight_filename = 'hydrate_test/spades.wts'
scorefxn = pyrosetta.get_score_function()
scorefxn.add_weights_from_file(weight_filename)

Alternatively, there is a hydrate scorefunction scorefxn = pyrosetta.create_score_function('hydrate_score12').

Using a Resfile

In the tests/integration/hydrate folder of Rosetta there is a test example, with these files made and barstar-barnase (1BRS).

Before starting let's check the resfile is correct for the pose, because, for me at least, 99% of the time 123.resfile is not meant for temp_2-retry_finalB.pdb for some feat of genius logistics...

packtask = pyrosetta.standard_packer_task(pose)
pyrosetta.rosetta.core.pack.task.parse_resfile(pose, packtask, 'hydrate_test/pack.resfile')

As expected it works. However, if we wanted to make our own resfile manually —as opposed to defining a packer for some obscure reason— there is the command:

pyrosetta.toolbox.generate_resfile.generate_resfile_from_pose(pose, 'myfile.resfile', pack=False, design=False, input_sc=True, freeze=[], specific={})

Where the dictionary specific, you can specify the residues to be repacked (NATAA), kept untouched (NATRO) or designed (PIKAA X).

To repack we can calling it like:

pyrosetta.rosetta.protocols.hydrate.Hydrate(scorefxn, 'resfile').apply(pose)

Assing that the -packing:resfile command line option was passed either with pyrosetta.rosetta.basic.options.set_file_option('packing:resfile', 'xxxx') or pyrosetta.init(extra_options='-packing:resfile hydrate_test/pack.resfile').

Spying

The packer task and movemap are generated behind the scenes, but we can spy on what a given command does by calling the same function:

packer_task = pyrosetta.rosetta.core.pack.task.TaskFactory.create_packer_task(pose)
movemap = pyrosetta.MoveMap()
pyrosetta.rosetta.protocols.hydrate.set_task_and_movemap(pose=pose,
                                                         protein_flexibility='all',
                                                         task=packer_task,
                                                         mm=movemap,
                                                         minimize_bb_where_packing=True)

This function fills the packer_task and movemap objects according to some specified logic set in minimise backbone and protein flexibility. Which can be printed for inspection.

Unfortunately, there is no way of setting a movemap and packertask.

PDBInfo

A minor annoyance is that the PDBInfo gets reset/rebuilt oddly under some combinations of commands (uncommon) —generally it is okay in pyrosetta but the indices are pose indices in the dumped PDB. But this can be easily circumvented by keeping a copy of it before the operations and reapplying it —don't bother at first, do it only if you hit said corner case. Although you may want to rename the waters to a given chain.

pdb_info = pose.pdb_info()
water_selector = pyrosetta.rosetta.core.select.residue_selector.ResidueNameSelector()
water_selector.set_residue_name3('WAT')
wv = water_selector.apply(pose)
RV = pyrosetta.rosetta.core.select.residue_selector.ResidueVector
for i,j in enumerate(RV(wv)):
    pdb_info.set_resinfo(j, 'X', i)
pose.pdb_info(pdb_info)

Other observations

Non-peptide ligands (without CA atom) cause problems —not sure what happens with nucleic acid polymers— I think it is a packer dependent problem, but deleting the residues before hydration fixes it swiftly (pyrosetta.rosetta.core.pose.remove_nonprotein_residues(pose)
Deleting any residue after hydration will cause an error.
NNCA are treated normally
Water can dissociate —this, like the charged amino acids, is not accounted for. Neutron diffraction structures feature water (resn HOH), hydroxydes (resn OH) and hydroniums (H3O). These play a role in some coordinations. However, there exists no rule that defines that WAT can be any of the HOH, OH, H3O and this would be a can of worms so makes sense it is not addressed in the hydrate protocol or actually any hydration method.
Water is polarisable —Drude oscillators are not a feature in Rosetta —yes, OpenMM does it, but really few tools do.

The art of blowing up protein

Pages

Saturday, 8 August 2020