Monday, 16 October 2017

Hacking PDBs for fusion protein

This is a How-to guide to make structures of fusion protein —the shoddy way.
NB. This assumes medium–advanced PyMol competency.

Introduction

Representing the structures of fusion protein is a rather annoying task. There are many ways of doing this.
Contrary to expectations, submitting the sequence of a fusion protein to the Phyre server or similar will return spaghetti, just as happens with circular permutations (for which you get a cracked open domain). But give it a go nevertheless —it may save you a lot of time!
Rosetta can be used, but the domains will not be photogenic and it is truly overkill.
So the best way is to simply hack the PDB files and some PyMol ninjutsu.

As an example I will make phusion, i.e. Pyrococcus furiosus DNA polymerase fused to Sulfolobus solfataricus 7D domain (Sso7), because its name basically says it is a fusion.

Starting structures

First one has to get the structures. If a structure is present in the PDB that is perfect. If not, I strongly recommend the Phyre 2 server over others predictors.
I find Phyre 2 mostly accurate and relatively fast. I have found iTasser to be just a tad less believable (but a solid runner up). SwissModel is only threading (no ab-initio) so it is heresy to use in this decade. While Robetta is glacially slow. However, do not submit too much to Phyre 2 in one go: go modular. Although a glycine or serine for linkers are very helpful.

In our case, phusion as a whole is not available —it's a "secret". But we have these two structures:
Just for the record, this is what happens when the whole of phusion is submitted to Phyre 2 (model in gray, Sso 7D in blue and Pfu PolB in green):
Basically Sso 7D is a mess. But Phusion seems kind of okay, but we will need a better representation.
There a few nice scripts in PyMol worth getting, foremost being align_all.py, one of these is colorbyrmsd.py, which shows the differences as B-factors.
Basically, the C-terminus where the Sso7D was placed spaghettified the structure. Qu'vatlh!

Missing loops

I add missing loops using Rosetta remodel for various reasons —but its use is far from trivial. But if the starting crystal structure is publically available, submitting the sequence to Phyre should produce the missing loop. But do check by opening both in the same instance of PyMol and typing
> align name_of_A, name_of_B
As this first part is meant to be simple I will pretend 2gju is complete.

Clean-up

Assuming you have all you need —I will discuss linkers, TEV sites and epitopes in part 2— it is time to make some fusion protein.
First open one pdb and drag the other into the same instance.
Second remove waters, crystallographic additives and any extra chains, but if you have to remove whole domains leave residues to make life easier as we will see. Namely, 3-button mouse editing is painful to use, so the best way to bring protein A near protein B is to use the align command. Ideally you should have a residue or more in excess that you will trim off after aligning the structures. If not you will have to add one with the sculpting wizard.

In our example, let's load both and look at the sequence. Either via the menu or by typing:
> set seq_view, 1
I see that there are manganese atoms and not magnesium which is funny and lots of waters and a DNA strand. I will get rid of the waters, either via A(ction) or by typing
> remove solvents
Now, I actually was hoping to join the two domains with the residues 'YQKTRQVGLTSWLNIKKSMH', but for now I will pretend I wanted to cut the loop bit off the end.
Basically Y751 onwards. So let's align this residue with the first residue of the other protein. To do this we need to get into selection logic in PyMol (for more see official site).
> align 2jgu & resi 751-753 & name CA, 1bbx & chain C & resi 1-3 & name CA
So basically the command is align <to_be_moved_protein>,<fixed_protein>
The selection of the first is the structure 2jgu and resi (residue index) 751 and atoms named C-alpha. AND means that it selects only elements present in both.  The second term has an extra bit, chain C as there are multiple chains.

Beautiful! That was lucky, although sometimes it is a bit odd, so sometimes shifting the numbers by a residue does the trick.
But wait, by your zealous screaming of "dihedral angles" I see you have noticed a problem. By aligning the alpha carbons I basically have ignored the psi, phi and omega angles. I was hoping I'd get away with it. Oh well. In that case.
> align 2jgu & resi 752-753 & (name CA | name N | name O),
         1bbx & chain C & resi 2-3 & (name CA | name N | name O)
Note how the numbering is shifted: I played around to make it align okay, this means though I will have to use the mutagenesis wizard to fix it.
However, in some cases there is a clash. That is something for part 2 of this post.

Pre-hack

We need to make both structures part of the same chain. To do so we do the following.
Let's save the session. There is no undo.
Remove residues 752 and upwards from 2jgu
> remove 2jgu & resi 752-999

Remove the first residue off chain C of ibbx (we shifted them along).
Use the mutagenesis wizard to mutate Y751 to an alanine.
Now. Let's fix the numbering. The command alter takes some getting used to. Just remember the command is called alter, which means you can Google "PyMol alter" to be an instant PyMol ninja. resi 1 of chain C of 1bbx will need to be resi 751 of chain A. First make chain A become something else or you will have a mess.
> alter 1bbx & chain A, chain='D'
> alter 1bbx & chain C, chain='A'
> sort
> alter 1bbx & chain A, resi=str(int(resi)+750) # 752-2 = 750
> sort
Now we can select all (type it) and export the selection as a molecule in PDB format (untick all for the bare minimum).
In some sequences there is line called TER in the PDB file, this needs to be removed at the site of fusion obviously. To do this, open the PDB file in TextEdit/Notepad and find it.
If you reopen this file you will have a fusion protein!
Congratulations!
In the next post I will discuss some harder concepts: namely extra residues (linkers, TEV sites, epitopes), rotating the sequence around to fix clashes, averaging the positions of atoms to fix stuff and relaxing the structure.

Friday, 18 August 2017

Rosetta easteregg

The guide to use Rosetta may be a bit, ehm, flaky, but I think it is made up for by this rather amusing comment header I found in one of its Perl script, kudos to the author.

###############################################################################
#
# MAKE_FRAGMENTS.PL 1.00 -- THE (PEN)ULTIMATE IN HOME FRAGMENT-PICKING SOFTWARE!
#
# CAUTION:  NO USER SERVICEABLE PARTS BELOW!
#
#           TO REDUCE RISK OF ELECTRIC SHOCK, DO NOT REMOVE THE COVER!
#           DO NOT ATTEMPT REPAIRS!  REFER SERVICING TO YOUR AUTHORIZED DEALER!
#           AVOID PROLONGED EXPOSURE TO HEAT OR SUNLIGHT!
#           TO REDUCE THE RISK OF FIRE OR ELECTRIC SHOCK, DO NOT EXPOSE THE
#            PRODUCT TO RAIN AND/OR MOISTURE!
#           DO NOT MOVE THE PRODUCT WHILE IN USE!
#           DO NOT LOOK AT THE PRODUCT WHILE IN USE!
#           DO NOT COMPLAIN ABOUT THE PRODUCT WHILE IN USE!
#           DO NOT DISCUSS THE PRODUCT WHILE IN USE!
#           DO NOT THINK ABOUT THE PRODUCT WHILE IN USE!
#           CLEAN ONLY WITH MILD DETERGENTS AND A SOFT CLOTH!
#           USE ONLY IN WELL-VENTILATED AREAS!
#
#           FOR EXTERNAL USE ONLY!  DO NOT TAKE INTERNALLY!
#           MAY PRODUCE STRONG MAGNETIC FIELDS!
#
#           DO NOT REMOVE THIS TAG UNDER PENALTY OF LAW.
#
#           THIS ARTICLE CONTAINS NEW MATERIAL ONLY.
#
#           THIS LABEL IS AFFIXED IN COMPLAINCE WITH THE UPHOLSTERED AND
#            STUFFED ARTICLES ACT.
#
#
# (IN OTHER WORDS:  DON'T EVEN *THINK* ABOUT CHANGING THINGS BELOW THIS POINT!)
#
###############################################################################

Wednesday, 5 July 2017

In vivo shuffling via heteroduplex amplicons

Bluescreen transilluminator photo showing
different shades of green.
In a previous post I discuss the heteroduplicity of epPCR plasmids. Namely often colonies in a library will have bases that have two equal variants. The most likely cause being that the amplicons do not anneal perfect and that the transformed plasmids are actually heteroduplexes. These can either divide before being corrected or are corrected by mismatch repair.
I did a sneaky experiment to see if this could be used to make protocol to shuffle mutations between variants and got a positive result, but possibly not as effective as hoped.

Sunday, 23 April 2017

A note on Mutazyme and Manganese

Can Mutazyme be powered up?
Adding manganese works poorly, but stronger variants seem to be known.
(see also discussion about Mutazyme in Part I and II)

Tuesday, 11 April 2017

A simple hack for a phylogenetic Noah's ark dilemma

Ever had an endless list of bacterial names that needed a trim?
Ever see a tree where the bacterium chosen is not the famous one, but it's cousin? Or actually a tree where you don't recognise a single name?
The issue of picking bacteria from a list is what I call Noah's ark dilemma. This term is used generally for the biblical problem of the size of the boat required for all the animals in existence (except dinosaurs). Here I mean it picking the most meaningful bacteria from a list. In the past year, I have come to rely on a simple solution: Pubmed popularity.

Sunday, 26 February 2017

Peak height variation

Sequencing a plasmid pool containing a sequence with a randomised codon can reveal the frequencies of the bases are (Acevedo-Rocha et al., 2015 ).
The problem is that sequence traces are not consistent. Some peaks are bigger than others and beyond a certain point the traces get messy. So how does that affect the prediction of the base frequencies?

Friday, 6 January 2017

Top 5 useful facts about the MDS42 strain

MDS42, also known as the Blattner strain, is a strain made in 2006 by deleting 13% of the genome of E. coli K-12 MG1655. Unfortunately, as tools go, it rivals IKEA in cryptic instruction manuals.