The art of blowing up protein: 2022

Sunday, 20 November 2022

glibc 2.36 vs. CentOS 7: a tale of failure

My favourite part of coding is planning and implementing some cool idea for doing something, especially if it involves some fun maths I read up on Wikipedia a minute beforehand. In reality polishing dirty data, refactoring someone-else's bad code, reverse engineering the use of a module and trying to get stuff to work is what take up most of my time.

Having got cocky I thought I could get the latest GNU library for C (glibc) working on CentOS 7. I failed miserably, here is my sorry tale down the rabbit hole.

Friday, 4 November 2022

In ML a module is not a namespace but a base class, because... ?

Deep learning is changing the world and fast. The list of achievements is impressive, however, why focus on the positive, when we can moan about the negative? In this blog post I will discuss three minor details that I find annoying about deep learning, namely the key word Module, the limited use of Google/Coral Edge TPUs and the coding quality of the field.

Saturday, 8 October 2022

Star imports trick

Star-imports (from typing import *) in Python are a handy, but dangerous. They are meant for quick coding, i.e. like on a jupyterlab notebook. However they are bad as they can mask other variables and cause issues down the line. They are ubiquitous online as are guides explaining why they are bad, here I just want to share a handy snippet to iron out star-imports.

Saturday, 1 October 2022

Move aside coIP Westerns, ColabFold has got this!

Recently AlphaFold2 released a new batch of models, this time covering all of the Trembl sequences in Uniprot, resulting in a huge number, which got hashtag-academic-twitter and some news editors very excited for the stamp-collecting feat. Personally, I find it annoying, not because it's pointless, but as of writing this, it has made any search for a target by name swamped by irrelevant sequences.
However, AlphaFold is great for other feats.
I have blogged about it a few times (e.g. link), which gives away my positive view of it! It can predict oligomers, with a lot more precision and confidence than docking. It does not always work either technically or meet the hypothesis. I did a long series of experiments with a hypothesis in mind which wasn't valid in the end (here), but revealed novel science and took a few minutes to set up and a few hours to run, which would have taken years if done by Western blot of a co-immunoprecipitation or cross-linking mass-spec.

Sunday, 19 June 2022

Top 10 silliest PDB residue names for ligands!

UPDATE: The PDB will finish 3 letter chemical component IDs sometime before 2024 at which point they will switch to 5 letter codes, which will be usable solely in CIF format: https://www.wwpdb.org/news/news?year=2022#630fee4cebdf34532a949c34

In some situations it is handy to use in an in silico experiment a 3-letter residue name that is not taken in the PDB. For example, PyRosetta has a system of pregenerated topologies for PDB components, which can cause issues when a ligand is loaded and the movers may use that over an incorrectly provided residue type / param file, resulting in a blown up mishapen ligand —an overly common incident*. As a result, having a list handy of what is taken is helpful. Herein are some silly observations about what the taken and untaken names are —but not ranked as a top 10, because this is not a science blog, not my local newspaper.

Saturday, 4 June 2022

Annotate as you go

There's a counter-constructive saying: a project is dead as soon as you add documentation (Aeschylus, I believe).

This could not be more incorrect. Whereas it is true that writing documentation on an evolving project will quickly result in the fresh documentation becoming quickly invalid, it is a planning truth that writing documentation once a project is finishing is impossible because there are a hundread and one more pressing issues. Therefore, adding docstrings to each function, method and class in Python as one goes along is by far more advantageous. Once this is done, however this information needs to be transmuted into documentation. Here is how once can set up ReadTheDocs without falling into a few traps, as the documentation generator Sphinx is ironically weirdly documented and should be done ideally early on, so one knows what mistakes one's making.

Tuesday, 10 May 2022

Show neighbours in nglview

Nglview is a really nice Python library which encodes a widget to show a NGL viewport, a JS 3D protein viewer used until recently by the PDB. One annoying feature is that one cannot select neighbours as easily as say PyMOL's "select byres HEM around 3". But it is possible and here is how.

Saturday, 7 May 2022

JS in Colab

A Jupyter or Colab notebook has two sides, one is the Python kernel, which may be running on a remote machine, and the front-end running in one's browser. The JavaScript in the browser and the Python kernel as a result may be on separate machine, yet it is possible to make them dialogue. However, this differs between Jupyter and Colab, the latter being more restrictive. I have found this difference problematic and even though I may not be fully versed in Colab functionality I want share some pointers, discussed below. Majorly:

Colab diverges greatly from Jupyter in terms of JS operations.
JS code injected into Colab is sandboxed within each cell.
There is no requireJS in Colab cells or window.
Imported modules have to be external to Colab/Drive.

Saturday, 2 April 2022

Covalents, patches and N-O-S bridges in PyRosetta

Crosslinked residues are common, but for sure make up for it by being simultaneously highly intriguing and highly technically problematic. Oddly, I seem to keep bumping into them. During my PhD a decade ago I saw a talk by the father of Kiwi structural biochemistry, Ted Baker, about a curious case where they found an isopeptide bonds hidden in their crystal density. In a postdoc I worked with isopeptide bonds —I blogged about isopeptide bonds in Rosetta four years ago. During the start of the pandemic I dis some covalent-docking of compounds with PyRosetta for the Covid Moonshoot project, which evolved into Fragmenstein. Most tools have a hard time with crosslinks. And last month the Twittersphere was abuzz with the news of lysine-hydroxylcysteine (N-O-S) bridges in protein.

PyMOL will strip LINK entries from PDBs on saving while NGL obeys only CONECT entries in PBDs. An exception is PyRosetta: it behaves very nicely with disulfides, isopeptide bonds ( cf. repo of PyRosetta code from Keeble et al.) and other crosslinks —mostly. As a result I thought I'd add a note on how to add them in PyRosetta.

Tuesday, 11 January 2022

ggplot colours in Python

In my post about multiple poses in NGLView I mention using ggplot colours in Python, here I revisit it in a bit more detail. Warning: Contains colour theory which may befuddle some viewers.

Pages