Hacking PDBs for fusion protein

Monday, 16 October 2017

Hacking PDBs for fusion protein

This is a How-to guide to make structures of fusion protein —the shoddy way.
NB. This assumes medium–advanced PyMol competency.


Representing the structures of fusion protein is a rather annoying task. There are many ways of doing this.
Contrary to expectations, submitting the sequence of a fusion protein to the Phyre server or similar will return spaghetti, just as happens with circular permutations (for which you get a cracked open domain). But give it a go nevertheless —it may save you a lot of time!
Rosetta can be used, but the domains will not be photogenic and it is truly overkill.
So the best way is to simply hack the PDB files and some PyMol ninjutsu.

As an example I will make phusion, i.e. Pyrococcus furiosus DNA polymerase fused to Sulfolobus solfataricus 7D domain (Sso7), because its name basically says it is a fusion.

Starting structures

First one has to get the structures. If a structure is present in the PDB that is perfect. If not, I strongly recommend the Phyre 2 server over others predictors.
I find Phyre 2 mostly accurate and relatively fast. I have found iTasser to be just a tad less believable (but a solid runner up). SwissModel is only threading (no ab-initio) so it is heresy to use in this decade. While Robetta is glacially slow. However, do not submit too much to Phyre 2 in one go: go modular. Although a glycine or serine for linkers are very helpful.

In our case, phusion as a whole is not available —it's a "secret". But we have these two structures:
Just for the record, this is what happens when the whole of phusion is submitted to Phyre 2 (model in gray, Sso 7D in blue and Pfu PolB in green):
Basically Sso 7D is a mess. But Phusion seems kind of okay, but we will need a better representation.
There a few nice scripts in PyMol worth getting, foremost being align_all.py, one of these is colorbyrmsd.py, which shows the differences as B-factors.
Basically, the C-terminus where the Sso7D was placed spaghettified the structure. Qu'vatlh!

Missing loops

I add missing loops using Rosetta remodel for various reasons —but its use is far from trivial. But if the starting crystal structure is publically available, submitting the sequence to Phyre should produce the missing loop. But do check by opening both in the same instance of PyMol and typing
> align name_of_A, name_of_B
As this first part is meant to be simple I will pretend 2gju is complete.


Assuming you have all you need —I will discuss linkers, TEV sites and epitopes in part 2— it is time to make some fusion protein.
First open one pdb and drag the other into the same instance.
Second remove waters, crystallographic additives and any extra chains, but if you have to remove whole domains leave residues to make life easier as we will see. Namely, 3-button mouse editing is painful to use, so the best way to bring protein A near protein B is to use the align command. Ideally you should have a residue or more in excess that you will trim off after aligning the structures. If not you will have to add one with the sculpting wizard.

In our example, let's load both and look at the sequence. Either via the menu or by typing:
> set seq_view, 1
I see that there are manganese atoms and not magnesium which is funny and lots of waters and a DNA strand. I will get rid of the waters, either via A(ction) or by typing
> remove solvents
Now, I actually was hoping to join the two domains with the residues 'YQKTRQVGLTSWLNIKKSMH', but for now I will pretend I wanted to cut the loop bit off the end.
Basically Y751 onwards. So let's align this residue with the first residue of the other protein. To do this we need to get into selection logic in PyMol (for more see official site).
> align 2jgu & resi 751-753 & name CA, 1bbx & chain C & resi 1-3 & name CA
So basically the command is align <to_be_moved_protein>,<fixed_protein>
The selection of the first is the structure 2jgu and resi (residue index) 751 and atoms named C-alpha. AND means that it selects only elements present in both.  The second term has an extra bit, chain C as there are multiple chains.

Beautiful! That was lucky, although sometimes it is a bit odd, so sometimes shifting the numbers by a residue does the trick.
But wait, by your zealous screaming of "dihedral angles" I see you have noticed a problem. By aligning the alpha carbons I basically have ignored the psi, phi and omega angles. I was hoping I'd get away with it. Oh well. In that case.
> align 2jgu & resi 752-753 & (name CA | name N | name O),
         1bbx & chain C & resi 2-3 & (name CA | name N | name O)
Note how the numbering is shifted: I played around to make it align okay, this means though I will have to use the mutagenesis wizard to fix it.
However, in some cases there is a clash. That is something for part 2 of this post.


We need to make both structures part of the same chain. To do so we do the following.
Let's save the session. There is no undo.
Remove residues 752 and upwards from 2jgu
> remove 2jgu & resi 752-999

Remove the first residue off chain C of ibbx (we shifted them along).
Use the mutagenesis wizard to mutate Y751 to an alanine.
Now. Let's fix the numbering. The command alter takes some getting used to. Just remember the command is called alter, which means you can Google "PyMol alter" to be an instant PyMol ninja. resi 1 of chain C of 1bbx will need to be resi 751 of chain A. First make chain A become something else or you will have a mess.
> alter 1bbx & chain A, chain='D'
> alter 1bbx & chain C, chain='A'
> sort

> alter 1bbx & chain A, resi=str(int(resi)+750) # 752-2 = 750
> sort
Now we can select all (type it) and export the selection as a molecule in PDB format (untick all for the bare minimum).
In some sequences there is line called TER in the PDB file, this needs to be removed at the site of fusion obviously. To do this, open the PDB file in TextEdit/Notepad and find it.
If you reopen this file you will have a fusion protein!
In the next post I will discuss some harder concepts: namely extra residues (linkers, TEV sites, epitopes), rotating the sequence around to fix clashes, averaging the positions of atoms to fix stuff and relaxing the structure.

No comments:

Post a Comment