A segfault and NaN driven series of disconnected ideas, analyses and just plain silly posts about computational biochemistry, synthetic biology and microbiology.
Sunday, 31 December 2023
A possible BioB bipass route
Saturday, 1 October 2022
Move aside coIP Westerns, ColabFold has got this!
Recently AlphaFold2 released a new batch of models, this time covering all of the Trembl sequences in Uniprot, resulting in a huge number, which got hashtag-academic-twitter and some news editors very excited for the stamp-collecting feat. Personally, I find it annoying, not because it's pointless, but as of writing this, it has made any search for a target by name swamped by irrelevant sequences.
However, AlphaFold is great for other feats.
I have blogged about it a few times (e.g. link), which gives away my positive view of it! It can predict oligomers, with a lot more precision and confidence than docking. It does not always work either technically or meet the hypothesis. I did a long series of experiments with a hypothesis in mind which wasn't valid in the end (here), but revealed novel science and took a few minutes to set up and a few hours to run, which would have taken years if done by Western blot of a co-immunoprecipitation or cross-linking mass-spec.
Sunday, 19 June 2022
Top 10 silliest PDB residue names for ligands!
UPDATE: The PDB will finish 3 letter chemical component IDs sometime before 2024 at which point they will switch to 5 letter codes, which will be usable solely in CIF format: https://www.wwpdb.org/news/news?year=2022#630fee4cebdf34532a949c34
In some situations it is handy to use in an in silico experiment a 3-letter residue name that is not taken in the PDB. For example, PyRosetta has a system of pregenerated topologies for PDB components, which can cause issues when a ligand is loaded and the movers may use that over an incorrectly provided residue type / param file, resulting in a blown up mishapen ligand —an overly common incident*. As a result, having a list handy of what is taken is helpful. Herein are some silly observations about what the taken and untaken names are —but not ranked as a top 10, because this is not a science blog, not my local newspaper.
Tuesday, 10 May 2022
Show neighbours in nglview
Nglview is a really nice Python library which encodes a widget to show a NGL viewport, a JS 3D protein viewer used until recently by the PDB. One annoying feature is that one cannot select neighbours as easily as say PyMOL's "select byres HEM around 3". But it is possible and here is how.
Sunday, 31 October 2021
Multiple sequence alignments
- Sequence conservation is a key ingredient in most nucleotide mutation severity predictors.
- The covariance within it powers the AlphaFold2 Evoformer and other de novo structure predictors.
- The phylogeny extracted from it tells the evolutionary tale of the protein
Sunday, 17 October 2021
Filling missing loops by cannibalising AlphaFold2
![]() |
I could not resist this Photoshop. But the process is not as dramatic and the results not as bad as Temple of Doom... If done right. |
Tuesday, 27 July 2021
What to look out for with an AlphaFold2 model
There is nothing more disheartening than telling someone "Sorry, I cannot help you with your protein, because no homologue structures of your protein are solved and any model will be rubbish". Now, with AlphaFold2 proteome release this is no longer the case. Or mostly: in fact there are several pitfalls and issues that need to be looked at, because the algorithm does not account for three things: binding partners and ligands, oligomerisation and alternate conformations.
Wednesday, 7 July 2021
Per residue RMSD
Recently I calculated the local RMSD caused by each residue and I thought I'd share the methods I used using PyRosetta —it is nothing at all novel, but I could not find a suitable implementation. The task is simple given two poses, find out what residue's backbone is changing the most by scanning along comparing each a short peptide window from each.
Monday, 26 April 2021
Remodel in Pyrosetta
The Rosetta binary Remodel is a great tool as it allows interesting designs to be made. However, it is rather incompatible with Rosetta Scripts and Pyrosetta as it is heavily dependent on command line options for customisation and repeats some of the processes internally. Despite this, it can be cohersed rather effectively to work in Pyrosetta with some convenience and this is how.
Monday, 22 February 2021
Multiple poses in NGLView
As mentioned previously, most of my Pyrosetta operations are done in a Jupyter notebook run in a cluster node. As a result, I am heavily dependent on NGLView, an IPython widget that uses NGL.js. This is nice for some quick tasks, although admitted more limited than the PyMOL mover, which however requires another ssh to forward another port. My Michelanglo webapp uses NGL.js, so I cannot but say good things of NGL.js. However, one or two things in the Python module NGLView are not immediately clear, so I'll quickly cover dealing with multiple poses here.
Tuesday, 21 July 2020
Switching ligand in a PDB with Fragmenstein
Saturday, 4 July 2020
Filling missing loops —the proper way
Since posting this, I realised one can do it even faster by hijacking the threading algorithm, which albeit not it's intended purpose works fine for fixing a structure without supervision —which the following discussed methods do.
Sunday, 19 April 2020
How to set up an electron density scorefunction in Pyrosetta
Wednesday, 18 March 2020
Atom names purely in RDKit
CA
is the standard name for the α-carbon. Example uses of atom names in Rosetta/pyrosetta include setting constraints, using a params file for a custom ligand and so forth. However, RDKit is a bit of a nuisance with atom names as it is not a central feature, but a feature added for PDB files that is not too well documented.Thursday, 7 November 2019
Go away glycerol!!
Wednesday, 4 September 2019
PDB numbering rollercoaster
- position in whole protein,
- position in extracted sequence,
- position in residues stated in the PDB/mmCIF structure and
- position which actually has coordinates.
Saturday, 3 August 2019
When will the PDB run out of 4-letter codes?
The current total is 155,618 structures and new ones are added at a rate of 12000 structures per year, which means that, assuming a constant growth, in 125 years —(36 ^ 4 - 155,618 ) / 12,000 —the PDB will finish codes to allocate.
2145. That is a few years after the setting of Kim Robinson's New York 2140, where New York is a flooded super-Venice, so I am guessing the RCSB PDB, in San Diego, will have long been flooded so lack of 4-letter codes is not top of their concerns.