Everything you wanted to know about isopeptide bonds in Rosetta, but were too afraid to ask

Rosetta is great at predicting (with some accuracy) the energies of variant proteins, however, to make the most out of it with proteins with internal isopeptide bonds a few considerations are needed.

A note about PDB format

(skip if familiar)
PDB files can be opened in TextEdit/Notepad to reveal their workings (cf. PDB format): each line has a particular purpose, in particular each atom has a line (ATOM). The names of the atoms in residues are standard, basically the element plus a Greek letter translitterated into Latin alphabet —no Unicode in PDB files. In the case of two atoms vying for the same Greek letter a number is added afterwards. The exception are hydrogens, which are number + H + Greek letter of the parent atom. The backbone atoms are C for C', CA (Cα), N, O and H. The terminal oxygen of the C-terminus of a protein is OXT.

Atoms of ligands and the like are stored as HETATMs and are explicitly connected with extra lines at the end of the file, with the mispelt name CONECT.
Note that the spaces are important in a PDB file.

Linkages

Isopeptide bonds are stored in PDB files as LINK entries. While the crosslinking residues are regular lysine and ASP/ASN, but the latter residue lacks the OD2/ND2 atom. CONECT lines are not used for isopeptides and are not really read by Rosetta.

LINK         NZ  LYS A  31                 CG  ASP B 135     1555   1555  1.25

This states a bond between the atom named NZ (ζ) of chain A K31 and D135 of chain B. Annoyingly, the output of a Rosetta run strips the LINK —even with the header preservation tag on. So you have to add it back. First, Rosetta version ≥3.7 can handle this bond. But only between lysine and aspartate. As an isopeptide product is identical if the starting atom was an aspartate or an asparagine, so altering an asparagine to aspartate is required and makes no impact.
Parenthetically, cystine bonds are handled completely different. These have a SSBOND line and the cysteins are crosslinking (Cyd, not Cys or Cyx). Crosslinking Asx or Lyx are found in some PDBs, but isopeptides are marked with a LINK instead.

However, an amide bond is planar, whereas Rosetta distorts the Cγ_Asp–Nε_Lys dihedral. To avoid this a constraint is needed. Here is a sample isopeptide constraint and a wee python script to generate one.

AtomPair NZ 34A CG 135A HARMONIC 1.30 0.3
AtomPair CE 34A CG 135A HARMONIC 2.40 0.3
AtomPair 1HZ 34A OD1 135A HARMONIC 3.20 0.3
Angle CE 34A NZ 34A CG 135A CIRCULARHARMONIC 2.08 0.2 #119 deg
Angle NZ 34A CG 135A OD1 135A CIRCULARHARMONIC 2.08 0.2
Dihedral CE 34A NZ 34A CG 135A CB 135A CIRCULARHARMONIC 3.14159 0.2
Dihedral 1HZ 34A NZ 34A CG 135A OD1 135A CIRCULARHARMONIC 3.14159 0.2

Unfamiliar with constraints?

Rosetta has two constraint file formats, but enzdes one is used on for a few specific applications. To use

-constraints:cst_fa_file iso.fixer.cst
-constraints:cst_fa_weight 5

NB. Angles are in radians not in π radians so 3.14 is 180°, not 1.

If using Rosetta Remodel and are altering the isopeptide don’t forget to constrain it in the blueprint file and use a enzdes constraint.

Glutamic acid

The proton of glutamic acid hydrogen bonds with the isopeptide. Special settings need to be made for the catalytic glutamic acid. Rosetta deals with residue solely in the protonation state they have at neutral pH and variant deprotonated and protonated states have to be encoded especially. A partial exception is monoprotonated histine for which both tautomers (HID and HIE) are tested, both not the biprotonated (HIP). In the case of glutamic acid the file in $rosetta_path/main/database/chemical/residue_type_sets/fa_standard/residue_types/protonation_states does not have a double bonded oxygen. So a tweak to the file is needed changing the CD–OE1 bond to BOND_TYPE CD OE1 2. Alternatively, for my tweaked version and for other params files for isopeptides, see my GitHub page with params files.

For Relax and Remodel the params file need different flags, but to play it safe both can be given. If the glutamic acid is needed in the blueprint declare it as a NCAA.

-extra_res_fa GLH.new.params
-extra_res_cen GLH.new.params.

Also, crystallographic waters can be important. Rosetta normally uses implicit waters, but if a water is tightly bound it is often best to keep it. One way is to change the HOH to WAT and give the params file for HOH but altered to be WAT or alternatively (and more sensically) using the following flags.

-ignore_waters false
-relax:jump_move true

The latter flag is needed to allow the water to wobble too.

Transition state

A further trick can come from making transition state forms. These are valuable as a mutation that stabilises the transition state increases catalysis. Therefore transition state stabilisation is a big deal in in silico theozyme design.
In terms of isopeptide bonds there is actually a two step process. First the two reactive residues need to be in an "inverted protonation" state. Namely, the lysine is deprotonated while the aspartic acid is protonated.
The lysine therefore is a nucleophile, while the electrophile is the aspartic acid.
For each of these, the crosslinking residues need to be made especially. Alternatively, you can use already made ones from my GitHub page with params files. Simply edit the PDB file to change the residue names in question, e.g. ASP to ASH (aspartic acid) and LYS to LYD (deprotonated lysine).
Unlike the product, in the case of transition states it does matter whether the electrophilic residue was an asparagine or a protonated aspartic acid. The names of the non-canonical amino acids do not matter as long they are consistent within the params and within the pdb, for example I called the monoprotonated aspartate-hemiacetal ASL and the asparagine-hemianimal ASA. Note that the Γ carbon is chiral, to figure out which was better I looked at the literature, but I could have tried both.
Similarly to isopeptide planarity, adding constraints is quite necessary. For example:

AtomPair CG  135A NZ  34A HARMONIC 1.30 0.3
AtomPair NZ   34A OE1 80A HARMONIC 1.50 1.0
AtomPair OD2 135A HE2 80A HARMONIC 1.50 1.0
Angle    OD1 135A CG 135A NZ 34A HARMONIC 1.95 0.35

A nice application of this is using Rosetta Pmut_scan with the flag

-DDG_cutoff 999

which will give the mutational landscape and it can be compared to the unreacted state to find catalytic variants.

Isopeptide scanning?

A more wackier thing that one could try is to pick a protein interface and remodel in an isopeptide. Here is an example protocol:

Ignore the documentation about interface design in Remodel.
Make a fusion protein of the two (with a jump is fine).
Make a blueprint with the line 88 EMPTY NC ASA in order to insert a non-canonical amino acid and the nearby residues as ALLAA or a subset via PIKAA. Remember that Remodel does not like a change in residue if the preceding and succeeding backbone is fixed (so NATAA for them)
Make a constraint file
Run remodel with a decent nstruct value
Run mutate to change the NCAA to the unreactive forms
Score everything
Profit!

For some reason, using NC makes Remodel unhappy and most structures will fail. But some will work and some may score better than the unreacted forms. Cool, ae?
Okay, I must confess, I have done this in silico, but never tested the product, so do at your own risk!

2 comments:

Eileenby1 December 2021 at 20:53
Hi.Do you know how to init pyrosetta with multiple params? For example, I have two residues protonated (CYX\HIP) and made params file for them. How can I add them to pyrosetta.init(-extra_res_fa)? Meanwhile, do you know how to make CYX params? I noted that there are CYZ and CYV in rosetta/xx/residue_tpyes, can you tell the difference? Thank you.
Matteo Ferla6 December 2021 at 02:41
The option string for the `init` function in PyRosetta is the same as the command line in a Rosetta binary: you can repeat the command "-extra_res_fa foo.params -extra_res_fa bar.params" or add multiple entries to it "-extra_res_fa foo.params bar.params". I should mention that the arguments for `init` are options='-ex1 -ex2aro', extra_options='', set_logging_handler=None, notebook=None, silent=False. So it is best to call `pyrosetta.init(extra_options='-👾👾 👾👾 -👾👾 👾👾')`.

There is not a CYX params? Curious. "Traditionally" CYD is an cystine bonded cysteine, while CYX (and any other **X residues) is a crosslinked residue, however, Rosetta does not really care when the PDB input has a SSBOND or a LINK record. So if you have say an Alexa 488 maleimide crosslinked cysteine, you could use a regular CYS with a LINK record in your PDB —but the thiol proton (HG) may be there IIRC. But to answer your question, copying the cys.params, changing the name to `CYX`, changing `IO_STRING CYX C`, adding `VARIANT SIDECHAIN_CONJUGATION`, removing the `ATOM HG` line, adding a `CONNECT SG ` line and changing all instances of ` HG ` to `CONN3` in the dihedral definitions block. Should do it.

I had a gander at CYV. "CYS withVCB andVSG defined as virtual". I have no idea why the beta and gamma carbons are virtual... My guess it was for someone's experiment/proof-of-concept. So best ignored.

Sorry for the tardy reply —most comments are spam.

The art of blowing up protein

Pages

Tuesday, 18 September 2018