2020

Tuesday, 29 December 2020

From cartoon to interactive infographic –the sane way

Making cartoon representations (technically vector graphics) in Adobe Illustrator is very fun, whereas the very idea of making a cartoon representation via line plots with Excel, Matlab, R, Plotly etc. would make anyone insane even just thinking about it. Luckily Illustrator images can be coloured based on numerical data in an automated way... without being manually plotted in Excel. Here I discuss exporting the vector graphic and modifying it with D3.js in a Jupyter notebook.

Shake it like a polaroid picture: MD in pyrosetta

> This blog post has been unfinished for two years. So I am posting in the hopes it will spur me to finish it.

The score of a pose reflects how good its interactions are in that static arrangement, a static snapshot. However, given some energy several of these interactions may break and a different conformation is seen. The best way to describe what does 1 kcal/mol mean is that it is the typical strength of a hydrogen bond, but this is rather weak... in fact this is also the average collision energy of water molecules at 37°C, because that is the molar Boltzmann constant times temperature (k_BT/N_A). (At that point in the explanation is it paramount to resist the urge to explain that k_BT coincides with the mean of the Boltzmann distribution describing the energy of collisions as per Maxwell–Boltzmann statistics or else you get that glazed look thermodynamics seems to illicit even in folk that aren't hangover students)

Therefore, hydrogen bonds do come apart and together rather frequently and in some cases these dynamic properties result large scale switching. This cannot really be determined from a static score —even the per residue scores aren't an indication of dynamic properties. So how does one do an MD run in Pyrosetta?

Remote notebooks and Jupyter themes

Jupyter notebooks are great. PyCharm is great for writing a module, but Jupyter notebook let's you test snippets of code really easily. You can add a Julia kernel, run bash and JS snippets and add markdown notes. The even greater thing is that you can run them off remote machines. If you have too many notebooks on different machines it gets confusing, but luckily there is jupyter themes that let's you customise the colours. Here are the different colours.

XML to Pyrosetta: EvolutionaryDynamicsMover as an example

In the previous post I discuss the strategies to use a Pyrosetta class when the documentation lets you down. One topic discussed was the conversion of a Rosetta XML script to Pyrosetta. Here is an example, namely using the EvolutionaryDynamics mover as an example.

Pyrosetta scripting without a manual

I got recently asked how to figure out how to write a Pyrosetta script when there is no example. This is definitely the biggest weakness of Pyrosetta and Rosetta script, but it is not insurmountable. In fact, there is a wealth of information that is hidden that can be mined. Here is how and in the next post, I give an example.

The Freedom unit for molar energy: the foot-pound-force per pound-mole

In computational biochemistry the most commonly used unit is molar energy. The SI unit is kJ/mol (kilojoule per mole), but kcal/mol is also as frequently used —Google enumerates 5.3e6 and 3.8e6 pages for them respectively. Different programs use one or the other, GROMACS uses kJ/mol, while Rosetta uses kcal/mol. They differ by a factor of about 4, the latter has the advantage that 1 kcal/mol is the strength of a hydrogen bond and k_BT/N_A is 0.6 kcal/mol (25°C) or 1. kcal/mol (37°C), while the former being SI sounds more sciency ——and not in the overly obnoxious way as folk who use Kelvin for enzymology.

However, whereas it is not an SI unit, kcal/mol is still very metric and European, after all the unit calorie was introduced by a Frenchman. Therefore, a more American unit is clearly required. Hence the need for the foot–pound-force per pound-mole.

Rosetta/Pyrosetta on a cluster or in the cloud

Due to licensing Rosetta and Pyrosetta cannot be installed via apt-get/pip but has to be downloaded from the Rosetta Commons website. This makes things harder if you are in a colabs notebook, ssh'ed into a machine or running off a remote jupyter notebook. Luckily it actually is straightforward.

5-hydroxytryptophan biosynthesis

I was intrigued by a recent article in the journal Chem (link) entitled "Creation of Bacterial Cells with 5-hydroxytryptophan as a 21st Amino Acid Building Block" by Chen et al. in the group of Han Xiao at Rice University, wherein they make a strain that metabolically produces 5-hydroxytryptophan for genetic code expansion. It is an interesting example of why metabolic engineering is non-trivial and how scientific research does not progress in a logical fashion.

Stay hydrated

Waters can be an integral part of a protein structure, in fact, it is common to find water crystallised tightly in an X-ray structure. These waters can change the calculated Gibbs free energy of a protein and give better experimental results. Explicit waters can be added in Rosetta/Pyrosetta thanks to the SPaDES algorithm described in Lai et al. 2017. Here is a guide to using it in Pyrosetta.

Switching ligand in a PDB with Fragmenstein

For the Covid Moonshot project, one question by Prof. Frank von Delft of Diamond XChem led to a series of events that culminated in Fragmenstein, a module to do fragment mergers when the followup is as faithful to the starting crystal hits as possible. Even if it's intended use is the hit-to-lead process, there is a nice use that make it rather handy for computational biochemistry in general: switching the ligand in a PDB to another in an energy minimised fashion that obeys the original ligand.

Filling missing loops —the proper way

Previously, I posted about how to join proteins and add missing loops the shoddy way. Now I'll address how to do it correctly, using Rosetta or Pyrosetta —I am sorry this has been so long overdue.
Since posting this, I realised one can do it even faster by hijacking the threading algorithm, which albeit not it's intended purpose works fine for fixing a structure without supervision —which the following discussed methods do.

Love thy neighbours, but select them with caution

In Rosetta NeighborhoodResidueSelector behaves differently than PyMOL's expand selector and it is good to be aware of it. Namely, in PyMOL the distance is from any atom, while in Rosetta it is from the center of mass atom, unless specified differently. In reality CloseContactResidueSelector works like PyMOL's expand selector.

How to set up an electron density scorefunction in Pyrosetta

Energy minimising structures in Rosetta/Pyrosetta is essential to avoid artifactual results. Say a mutation is introduced and in the protocol the neighbourhood is repacked: if the structure is not energy minimised properly the neighbourhood repacking step will spuriously reward the mutation a very negative ∆∆G. One worry is that the energy minimisation is not faithful to the crystal structure. This argument has two sides, on one the fudgey force fields in Rosetta do not truly model the chemical interactions while on the other crystal packing may be unnatural. Both points have merit. After all Rosetta does use implicit water, which do not behave like the stripped crystallographic waters and some residues may have non-standard protonations etc. But if one wants one can use a scorefunction that is weighted by the electron density map and here is how.

Atom names purely in RDKit

For some applications, such as PyMOL scripts or Rosetta, atom names are really important, say CA is the standard name for the α-carbon. Example uses of atom names in Rosetta/pyrosetta include setting constraints, using a params file for a custom ligand and so forth. However, RDKit is a bit of a nuisance with atom names as it is not a central feature, but a feature added for PDB files that is not too well documented.

Working around segmentation faults of pyrosetta: threads & processes

Rosetta often does not die gracefully. Pyrosetta is the same. If the starting template is not great segmentation faults will result in the kernel issuing signal 11 to kill the process. The way around it is to spin it up as its own process via the multiprocessing module and not the threading module, because child threads use the same process.

Guess bond order in Rdkit by number of bound atoms

Some compChem/Biochem programs do not care about bond order and strip them, which is rather frustrating. Ligands in random PDB files without any name, smiles are a classic example.
There is no single magic mol.CorrectBondOrder() command in Rdkit, but luckily there are some tricks that can be done. Here I will discuss finding out using the number of bound atoms.

The art of blowing up protein

Pages

Tuesday, 29 December 2020

From cartoon to interactive infographic –the sane way

Saturday, 21 November 2020

Shake it like a polaroid picture: MD in pyrosetta

Sunday, 1 November 2020

Remote notebooks and Jupyter themes

Saturday, 31 October 2020

XML to Pyrosetta: EvolutionaryDynamicsMover as an example

Tuesday, 27 October 2020

Pyrosetta scripting without a manual

Friday, 9 October 2020

The Freedom unit for molar energy: the foot-pound-force per pound-mole

Wednesday, 7 October 2020

Rosetta/Pyrosetta on a cluster or in the cloud

Monday, 17 August 2020

5-hydroxytryptophan biosynthesis

Saturday, 8 August 2020

Stay hydrated

Tuesday, 21 July 2020

Switching ligand in a PDB with Fragmenstein

Saturday, 4 July 2020

Filling missing loops —the proper way

Monday, 8 June 2020

Love thy neighbours, but select them with caution

Sunday, 19 April 2020

How to set up an electron density scorefunction in Pyrosetta

Wednesday, 18 March 2020

Atom names purely in RDKit

Friday, 21 February 2020

Working around segmentation faults of pyrosetta: threads & processes

Wednesday, 12 February 2020

Guess bond order in Rdkit by number of bound atoms

About Me