Phosphorylated PDB files

Tuesday 15 January 2019

Phosphorylated PDB files

Sometime in human protein, a residue is phosphorylated, yet the model one gets from I-TASSER, Phyre etc. or the actual PDB structure lacks these. Here is how to add them easily and quickly with Rosetta.
There are several ways, another notable one is using Charmm-GUI PDB reader which allows the addition of phosphates and few cyanine dyes and other modifications. Here I will discuss using Rosetta as it is more flexible in the breadth of mutations.

Rosetta command line way

Firstly, in the PDB file, the residues that one wants modified need to be changed. In PyMOL, it is simply a question of typing alter resi xx+yy+zz, resn='XXX' and then sort, where xx, yy and zz are the residues numbers and XXX is the three letter code of the phosphorylated equivalent, namely:

  • SER (Serine) → SEP (Phosphoserine)
  • THR (Threonine) → TPO (Phosphothreonine)
  • TYR (Tyrosine) → PTR (Phosphotyrosine)
The saved pdb file will have these residues changed in name only. The next step is to force the change in Rosetta.
Parenthetically, I have wondered about if one could add non-canonical amino acids to the mutagenesis wizard in PyMol, but it is impossible/hard/buggy.
In the terminal simply run the Rosetta score app* with pdb output enabled.

$ROSETTA/score.$ROSETTAEXT -database $ROSETTADB -s your_modded_model.pdb -out:output -no_optH false;

The file your_modded_model_0001.pdb will have the correct NCAA.

*) Note that this only works with 3.8 and above thanks to the addition of the PDB Chemical Components dictionary (CCD) business —this is normally a pain as it prevents parsing of custom params files with shared names with PDB ligands and has to be disabled with -load_PDB_components false. The pymol step could be easily automated in python, but then if one were to do that, using pyrosetta would make more sense.


Some other residues of interest are:
  • GTP —GTP (be careful as a handful of structures have GDP with odd atom names)
  • MLZ —n-methyl-lysine
  • MLY —n-dimethyl-lysine
  • M3L —n-trimethyl-lysine
  • ORN —ornithine
  • SEC —selenocysteine
  • MSE —selenomethionine
  • HCS —homocysteine
  • SLZ —thialysine
  • NLE —norleucine
  • ALO —allothreonine
  • ALN —naphthalylalanine
  • 2MR —dimethylargnine
  • CIR —citrulline
  • ALY —acetyllysine


In pyrosetta things work a bit differently. First, don't get fooled by the mover pyrosetta.rosetta.protocols.enzymatic_movers.KinaseMover that like the rest of the virtual enzymes is meant to work as a virtual enzyme with a target sequence and an efficiency. What we want is pyrosetta.rosetta.protocols.simple_moves.MutateResidue. Do note that as a phosphorylation is a modification (a patch), using the three-letter codes such as SEP for phosphorylated serine will not work (yes, you guessed it, it segmentation-faults). Consequently the code to add is SER:phosphorylated, which actually is easier to remember. Using pyrosetta.rosetta.core.pose.add_variant_type_to_residue is more convoluted, but the patches, which run off the enum pyrosetta.rosetta.core.chemical.VariantType are better documented, in fact the best place to get info on the name is by rummaging in main/database/chemical/residue_type_sets/fa_standard/patches.
For an example, say we have a dictionary of post-translational modification with keys "from_residue", "chain", "residue_index", "to_residue" and "ptm" (which is in PhosphositePlus format: p, ac, m1, m2, m3, gal etc.).
import pyrosetta
pyrosetta.init('-mute all')
pose = pyrosetta.rosetta.core.pose.Pose()
pyrosetta.rosetta.core.import_pose.pose_from_file(pose, 'protein.pdb')

MutateResidue = pyrosetta.rosetta.protocols.simple_moves.MutateResidue
pose2pdb = native_phospho.pdb_info().pdb2pose
for record in PSP_modified_residues:
    change = record['from_residue']+'-'+record['ptm']
    if record['ptm'] == 'ub':
    elif record['ptm'] == 'p':
        patch = 'phosphorylated'
    elif record['ptm'] == 'ac':
        patch = 'acetylated'
    elif record['ptm'] == 'm1':
        patch = 'monomethylated'
    elif record['ptm'] == 'm2':
        patch = 'dimethylated'
    elif record['ptm'] == 'm3':
        patch = 'trimethylated'
        raise ValueError
    new_res = f"{seq3(p['from_residue'])}:{patch}"
    r = pose2pdb(res=int(record['residue_index']), chain=record['chain'])
    MutateResidue(target=r, new_res=new_res).apply(native_phospho)

scorefxn = pyrosetta.get_fa_scorefxn()
relax = pyrosetta.rosetta.protocols.relax.FastRelax(scorefxn, 2)

1 comment: