Showing posts with label compbiochem. Show all posts
Showing posts with label compbiochem. Show all posts

Saturday, 1 October 2022

Move aside coIP Westerns, ColabFold has got this!

Recently AlphaFold2 released a new batch of models, this time covering all of the Trembl sequences in Uniprot, resulting in a huge number, which got hashtag-academic-twitter and some news editors very excited for the stamp-collecting feat. Personally, I find it annoying, not because it's pointless, but as of writing this, it has made any search for a target by name swamped by irrelevant sequences.
However, AlphaFold is great for other feats.
I have blogged about it a few times (e.g. link), which gives away my positive view of it! It can predict oligomers, with a lot more precision and confidence than docking. It does not always work either technically or meet the hypothesis. I did a long series of experiments with a hypothesis in mind which wasn't valid in the end (here), but revealed novel science and took a few minutes to set up and a few hours to run, which would have taken years if done by Western blot of a co-immunoprecipitation or cross-linking mass-spec.

Tuesday, 10 May 2022

Show neighbours in nglview

Nglview is a really nice Python library which encodes a widget to show a NGL viewport, a JS 3D protein viewer used until recently by the PDB. One annoying feature is that one cannot select neighbours as easily as say PyMOL's "select byres HEM around 3".  But it is possible and here is how.

Sunday, 17 October 2021

Filling missing loops by cannibalising AlphaFold2

I could not resist this Photoshop.
But the process is not as dramatic
and the results not as bad as Temple of Doom...
If done right.
AlphaFold2 models have a complete sequence, but for innumerable reasons the crystal structure of the protein is better, but may have missing spans. As a result one may want, for illustrative purposes only, to rip out the required parts from the AlphaFold2 models (as fragments) and have them built into the target structure. Here is how to do it by threading.

Monday, 23 August 2021

Tweaking AlphaFold2 models with PyRosetta

In a previous post I explored the pitfalls of an AlphaFold2 model from EBI. Here I thought I'd share some PyRosetta methods that may be handy to use with AlphaFold2 models.

Tuesday, 27 July 2021

What to look out for with an AlphaFold2 model

There is nothing more disheartening than telling someone "Sorry, I cannot help you with your protein, because no homologue structures of your protein are solved and any model will be rubbish". Now, with AlphaFold2 proteome release this is no longer the case. Or mostly: in fact there are several pitfalls and issues that need to be looked at, because the algorithm does not account for three things: binding partners and ligands, oligomerisation and alternate conformations.

Wednesday, 7 July 2021

Per residue RMSD

Recently I calculated the local RMSD caused by each residue and I thought I'd share the methods I used using PyRosetta —it is nothing at all novel, but I could not find a suitable implementation. The task is simple given two poses, find out what residue's backbone is changing the most by scanning along comparing each a short peptide window from each.

Monday, 26 April 2021

Remodel in Pyrosetta


The Rosetta binary Remodel is a great tool as it allows interesting designs to be made. However, it is rather incompatible with Rosetta Scripts and Pyrosetta as it is heavily dependent on command line options for customisation and repeats some of the processes internally. Despite this, it can be cohersed rather effectively to work in Pyrosetta with some convenience and this is how.

Monday, 22 February 2021

Multiple poses in NGLView

As mentioned previously, most of my Pyrosetta operations are done in a Jupyter notebook run in a cluster node. As a result, I am heavily dependent on NGLView, an IPython widget that uses NGL.js. This is nice for some quick tasks, although admitted more limited than the PyMOL mover, which however requires another ssh to forward another port. My Michelanglo webapp uses NGL.js, so I cannot but say good things of NGL.js. However, one or two things in the Python module NGLView are not immediately clear, so I'll quickly cover dealing with multiple poses here.

Sunday, 1 November 2020

Remote notebooks and Jupyter themes

Jupyter notebooks are great. PyCharm is great for writing a module, but Jupyter notebook let's you test snippets of code really easily. You can add a Julia kernel, run bash and JS snippets and add markdown notes. The even greater thing is that you can run them off remote machines. If you have too many notebooks on different machines it gets confusing, but luckily there is jupyter themes that let's you customise the colours. Here are the different colours.

Saturday, 31 October 2020

XML to Pyrosetta: EvolutionaryDynamicsMover as an example

In the previous post I discuss the strategies to use a Pyrosetta class when the documentation lets you down. One topic discussed was the conversion of a Rosetta XML script to Pyrosetta. Here is an example, namely using the EvolutionaryDynamics mover as an example.

Tuesday, 27 October 2020

Pyrosetta scripting without a manual

I got recently asked how to figure out how to write a Pyrosetta script when there is no example. This is definitely the biggest weakness of Pyrosetta and Rosetta script, but it is not insurmountable. In fact, there is a wealth of information that is hidden that can be mined. Here is how and in the next post, I give an example.

Friday, 9 October 2020

The Freedom unit for molar energy: the foot-pound-force per pound-mole

In computational biochemistry the most commonly used unit is molar energy. The SI unit is kJ/mol (kilojoule per mole), but kcal/mol is also as frequently used —Google enumerates 5.3e6 and 3.8e6 pages for them respectively. Different programs use one or the other, GROMACS uses kJ/mol, while Rosetta uses kcal/mol. They differ by a factor of about 4, the latter has the advantage that 1 kcal/mol is the strength of a hydrogen bond and kBT/NA is 0.6 kcal/mol (25°C) or 1. kcal/mol (37°C), while the former being SI sounds more sciency ——and not in the overly obnoxious way as folk who use Kelvin for enzymology.

However, whereas it is not an SI unit, kcal/mol is still very metric and European, after all the unit calorie was introduced by a Frenchman. Therefore, a more American unit is clearly required. Hence the need for the foot–pound-force per pound-mole.

Wednesday, 7 October 2020

Rosetta/Pyrosetta on a cluster or in the cloud


Due to licensing Rosetta and Pyrosetta cannot be installed via apt-get/pip but has to be downloaded from the Rosetta Commons website. This makes things harder if you are in a colabs notebook, ssh'ed into a machine or running off a remote jupyter notebook. Luckily it actually is straightforward.

Saturday, 8 August 2020

Stay hydrated

Waters can be an integral part of a protein structure, in fact, it is common to find water crystallised tightly in an X-ray structure. These waters can change the calculated Gibbs free energy of a protein and give better experimental results. Explicit waters can be added in Rosetta/Pyrosetta thanks to the SPaDES algorithm described in Lai et al. 2017. Here is a guide to using it in Pyrosetta.

Tuesday, 21 July 2020

Switching ligand in a PDB with Fragmenstein

For the Covid Moonshot project, one question by Prof. Frank von Delft of Diamond XChem led to a series of events that culminated in Fragmenstein, a module to do fragment mergers when the followup is as faithful to the starting crystal hits as possible. Even if it's intended use is the hit-to-lead process, there is a nice use that make it rather handy for computational biochemistry in general: switching the ligand in a PDB to another in an energy minimised fashion that obeys the original ligand.

Saturday, 4 July 2020

Filling missing loops —the proper way

Previously, I posted about how to join proteins and add missing loops the shoddy way. Now I'll address how to do it correctly, using Rosetta or Pyrosetta —I am sorry this has been so long overdue.
Since posting this, I realised one can do it even faster by hijacking the threading algorithm, which albeit not it's intended purpose works fine for fixing a structure without supervision —which the following discussed methods do.

Monday, 8 June 2020

Love thy neighbours, but select them with caution

In Rosetta NeighborhoodResidueSelector behaves differently than PyMOL's expand selector and it is good to be aware of it. Namely, in PyMOL the distance is from any atom, while in Rosetta it is from the center of mass atom, unless specified differently. In reality CloseContactResidueSelector works like PyMOL's expand selector.

Sunday, 19 April 2020

How to set up an electron density scorefunction in Pyrosetta

Energy minimising structures in Rosetta/Pyrosetta is essential to avoid artifactual results. Say a mutation is introduced and in the protocol the neighbourhood is repacked: if the structure is not energy minimised properly the neighbourhood repacking step will spuriously reward the mutation a very negative ∆∆G. One worry is that the energy minimisation is not faithful to the crystal structure. This argument has two sides, on one the fudgey force fields in Rosetta do not truly model the chemical interactions while on the other crystal packing may be unnatural. Both points have merit. After all Rosetta does use implicit water, which do not behave like the stripped crystallographic waters and some residues may have non-standard protonations etc. But if one wants one can use a scorefunction that is weighted by the electron density map and here is how.

Wednesday, 18 March 2020

Atom names purely in RDKit

For some applications, such as PyMOL scripts or Rosetta, atom names are really important, say CA is the standard name for the α-carbon. Example uses of atom names in Rosetta/pyrosetta include setting constraints, using a params file for a custom ligand and so forth. However, RDKit is a bit of a nuisance with atom names as it is not a central feature, but a feature added for PDB files that is not too well documented.

Friday, 21 February 2020

Working around segmentation faults of pyrosetta: threads & processes

Rosetta often does not die gracefully. Pyrosetta is the same. If the starting template is not great segmentation faults will result in the kernel issuing signal 11 to kill the process. The way around it is to spin it up as its own process via the multiprocessing module and not the threading module, because child threads use the same process.