Monday 16 October 2017

Hacking PDBs for fusion protein and missing loops

This is a How-to guide to make structures of fusion protein —the shoddy way.
For the proper way see the how to do it in Rosetta or Pyrosetta or by coercing the RosettaCM threader do it.
NB. This assumes medium–advanced PyMol competency.

Introduction

Representing the structures of fusion protein is a rather annoying task. There are many ways of doing this.
Contrary to expectations, submitting the sequence of a fusion protein to the Phyre server (threading/ab initio predictor) or similar will return spaghetti, just as happens with circular permutations (for which you get a cracked half open domain with spaghetti attached). But give it a go nevertheless —it may save you a lot of time!
Rosetta can be used, but the domains will not be photogenic and it is overkill for some operations.
So the best way is to simply hack the PDB files and some PyMol ninjutsu.

EDIT/WARNING: since writing this I have realised that one could add missing loops by threading as several algorithms have loop adding operations —in which case using PyMod may be the best approach for non-coders as, it is less straighforward than this, but utilises a proper loop modelling algorithm as it's a GUI wrapper for MODELLER. PyMod requires some easy installations (MODELLER and a PyMOL plug in) and the generation of the alignment (load the structure, run print cmd.get_fastastr(), get your full sequence from Uniprot, go to the online Muscle alignment tool and bingo) and does not require any awkward pulling of chains.

About


As an example I will make phusion, i.e. Pyrococcus furiosus DNA polymerase fused to Sulfolobus solfataricus 7D domain (Sso7), because its name basically says it is a fusion.