Hacking PDBs for fusion protein

Monday, 16 October 2017

Hacking PDBs for fusion protein

This is a How-to guide to make structures of fusion protein —the shoddy way.
NB. This assumes medium–advanced PyMol competency.


Representing the structures of fusion protein is a rather annoying task. There are many ways of doing this.
Contrary to expectations, submitting the sequence of a fusion protein to the Phyre server (threading/ab initio predictor) or similar will return spaghetti, just as happens with circular permutations (for which you get a cracked half open domain with spaghetti attached). But give it a go nevertheless —it may save you a lot of time!
Rosetta can be used, but the domains will not be photogenic and it is truly overkill.
So the best way is to simply hack the PDB files and some PyMol ninjutsu.

As an example I will make phusion, i.e. Pyrococcus furiosus DNA polymerase fused to Sulfolobus solfataricus 7D domain (Sso7), because its name basically says it is a fusion.

Starting structures

First one has to get the structures. If a structure is present in the PDB that is perfect. If not, I strongly recommend the Phyre 2 server over others predictors.
I find Phyre 2 mostly accurate and relatively fast. I have found iTasser to be just a tad less believable (but a solid runner up). SwissModel is only threading (no ab initio) so it is heresy to use in this decade. While Robetta is glacially slow and if you can set up a Rosetta ab initio job I have no idea what you are doing reading this.
However, do not submit too much to Phyre 2 in one go: go modular. Although add a few glycines and serines for linkers are very helpful.

In our case, phusion as a whole is not available —it's a "secret". But we have these two structures:
Just for the record, this is what happens when the whole of phusion is submitted to Phyre 2 (model in gray, Sso 7D in blue and Pfu PolB in green):
Basically Sso 7D is a mess. But Phusion seems kind of okay, but we will need a better representation.
There a few nice scripts in PyMol worth getting, foremost being align_all.py, one of these is colorbyrmsd.py, which shows the differences as B-factors.
That is, the C-terminus where the Sso7D was placed spaghettified the structure. Qu'vatlh!


Assuming you have all you need —I will discuss linkers, TEV sites and epitopes in part 2— it is time to make some fusion protein.
First open one pdb and drag the other into the same instance.
Second remove waters, crystallographic additives and any extra chains, but if you have to remove whole domains leave residues to make life easier as we will see. Namely, 3-button mouse editing is painful to use, so the best way to bring protein A near protein B is to use the align command. Ideally you should have a residue or more in excess that you will trim off after aligning the structures. If not you will have to add one with the sculpting wizard.

So let's get started with our example and load both parts of phusion to make a fusion:
> fetch 2JGU, pfu
> fetch 1bbx, sso7d
Let's look at the sequence. Either via the menu or by typing:
> set seq_view, 1
I see that there are:
  1. Missing loops
  2. manganese atoms and not magnesium —which is a comical crystalisation artefact as manganese decreases polymerase fidelity
  3. lots of waters
  4. a DNA strand

Missing loops

I add missing loops using Rosetta remodel, because it is somewhat less arbitrary —but its use is not trivial— so I will talk about PyMol.

If the starting crystal structure is publicly available, submitting the sequence to Phyre should produce the missing loop. But do check by opening both in the same instance of PyMol and typing
> align name_of_A, name_of_B

Alternatively you can add it yourself. In fact, adding a missing manually loop is actually really fun in PyMol. The trick is to use build and sculpt.
So let's get started with our example and load both parts of phusion to make a fusion:
> fetch 2JGU, pfu

Now, in the menu under Build there are lots of fun things to play with!
First though let's find the C' atom of the last residue before the disordered loop. To do it by mouse, change the mouse mode to three button editing by clicking on the panel in the bottom left that says three button viewing. Now middle click to zoom into the last non-gray residue in the sequence:

Click the carboxyl carbon (C'). In this case there is only one oxygen, so no need to delete one oxygen. If there were two, the terminal oxygen is named "OXT" and is part of the last residue.
Then, click on Build>Residue and add the residues needed, they will overwrite the gray letters. To make it more fun pressing opt/alt and the single letter code of the amino acid will add the residue.

The chain added is straight and is not closed, which needs to be fixed.
Select the carboxyl carbon of the final residue (W616) and the backbone nitrogen of the next residue (S617) and you will get a distance. One could stop here and just join the two: under the menu, click Build>Create bond or type
> bond pk1, pk2

The dotted line is gone, but the bond is 10 time too long, so this is not really a good idea right now.
Luckily, there is a really fun tool called sculpting to fix it. So undo the last bond (unbond) and activate auto-scupting under Build>sculpting.
Now when you drag selected atoms around using cmd/ctrl+shift left mouse button the chain will spruce up. However, one problem is the nitrogen being repulsed by its new friend. So click on it (single ringed sphere = pk1, double = pk2 etc.) and type
> protect pk1
Now drag away! Do note that an amide bond is planar, so one needs to be careful to make it happy by having the nitrogen sticking up in the opposite side of the carboxyl oxygen. At some point, ~ 3Å, it may pay to bond the atoms, deactivate sculpting, moving the carboxyl carbon to 1.3Å and reactivating the sculpting and the dihedral issue might go away... somewhat. Also, there should be a single hydrogen on the nitrogen.
Great. However, there is another

Dehydrate and change the ions

I will get rid of the waters, either via A(ction) or by typing
> remove solvents
To change the ions type:
> alter resn MN, resn='MG'
Note that the residue called Mn is being changed not the element.


Now, I actually was hoping to join the two domains with the residues 'YQKTRQVGLTSWLNIKKSMH', but for now I will pretend I wanted to cut the loop bit off the end.
Basically Y751 onwards. So let's align this residue with the first residue of the other protein. To do this we need to get into selection logic in PyMol (for more see official site).
> align 2jgu & resi 751-753 & name CA, 1bbx & chain C & resi 1-3 & name CA
So basically the command is align <to_be_moved_protein>,<fixed_protein>
The selection of the first is the structure 2jgu and resi (residue index) 751 and atoms named C-alpha. AND means that it selects only elements present in both.  The second term has an extra bit, chain C as there are multiple chains.

Beautiful! That was lucky, although sometimes it is a bit odd, so sometimes shifting the numbers by a residue does the trick.
But wait, by your zealous screaming of "dihedral angles" I see you have noticed a problem. By aligning the alpha carbons I basically have ignored the psi, phi and omega angles. I was hoping I'd get away with it. Oh well. In that case.
> align 2jgu & resi 752-753 & (name CA | name N | name O),
         1bbx & chain C & resi 2-3 & (name CA | name N | name O)
Note how the numbering is shifted: I played around to make it align okay, this means though I will have to use the mutagenesis wizard to fix it.
However, in some cases there is a clash. That is something for part 2 of this post.


We need to make both structures part of the same chain. To do so we do the following.
Let's save the session. There is no undo.
Remove residues 752 and upwards from 2jgu
> remove 2jgu & resi 752-999

Remove the first residue off chain C of ibbx (we shifted them along).
Use the mutagenesis wizard to mutate Y751 to an alanine.
Now. Let's fix the numbering. The command alter takes some getting used to. Just remember the command is called alter, which means you can Google "PyMol alter" to be an instant PyMol ninja. resi 1 of chain C of 1bbx will need to be resi 751 of chain A. First make chain A become something else or you will have a mess.
> alter 1bbx & chain A, chain='D'
> alter 1bbx & chain C, chain='A'
> sort

> alter 1bbx & chain A, resi=str(int(resi)+750) # 752-2 = 750
> sort
Now we can select all (type it) and export the selection as a molecule in PDB format (untick all for the bare minimum).
In some sequences there is line called TER in the PDB file, this needs to be removed at the site of fusion obviously. To do this, open the PDB file in TextEdit/Notepad and find it.
If you reopen this file you will have a fusion protein!
In the next post I will discuss some harder concepts: namely extra residues (linkers, TEV sites, epitopes), rotating the sequence around to fix clashes, averaging the positions of atoms to fix stuff and relaxing the structure.

No comments:

Post a Comment