Sunday, 19 June 2022

Top 10 silliest PDB residue names for ligands!

In some situations it is handy to use in an in silico experiment a 3-letter residue name that is not taken in the PDB. For example, PyRosetta has a system of pregenerated topologies for PDB components, which can cause issues when a ligand is loaded and the movers may use that over an incorrectly provided residue type / param file, resulting in a blown up mishapen ligand —an overly common incident*. As a result, having a list handy of what is taken is helpful. Herein are some silly observations about what the taken and untaken names are —but not ranked as a top 10, because this is not a science blog, not my local newspaper.

Saturday, 4 June 2022

Annotate as you go

There's a counter-constructive saying: a project is dead as soon as you add documentation (Aeschylus, I believe).

This could not be more incorrect. Whereas it is true that writing documentation on an evolving project will quickly result in the fresh documentation becoming quickly invalid, it is a planning truth that writing documentation once a project is finishing is impossible because there are a hundread and one more pressing issues. Therefore, adding docstrings to each function, method and class in Python as one goes along is by far more advantageous. Once this is done, however this information needs to be transmuted into documentation. Here is how once can set up ReadTheDocs without falling into a few traps, as the documentation generator Sphinx is ironically weirdly documented and should be done ideally early on, so one knows what mistakes one's making.

Tuesday, 10 May 2022

Show neighbours in nglview

Nglview is a really nice Python library which encodes a widget to show a NGL viewport, a JS 3D protein viewer used until recently by the PDB. One annoying feature is that one cannot select neighbours as easily as say PyMOL's "select byres HEM around 3".  But it is possible and here is how.

Saturday, 7 May 2022

JS in Colab

A Jupyter or Colab notebook has two sides, one is the Python kernel, which may be running on a remote machine, and the front-end running in one's browser. The JavaScript in the browser and the Python kernel as a result may be on separate machine, yet it is possible to make them dialogue. However, this differs between Jupyter and Colab, the latter being more restrictive. I have found this difference problematic and even though I may not be fully versed in Colab functionality I want share some pointers, discussed below. Majorly:

  • Colab diverges greatly from Jupyter in terms of JS operations.
  • JS code injected into Colab is sandboxed within each cell.
  • There is no requireJS in Colab cells or window.
  • Imported modules have to be external to Colab/Drive.

Saturday, 2 April 2022

Covalents, patches and N-O-S bridges in PyRosetta

Crosslinked residues are common, but for sure make up for it by being simultaneously highly intriguing and highly technically problematic. Oddly, I seem to keep bumping into them. During my PhD a decade ago I saw a talk by the father of Kiwi structural biochemistry, Ted Baker, about a curious case where they found an isopeptide bonds hidden in their crystal density. In a postdoc I worked with isopeptide bonds —I blogged about isopeptide bonds in Rosetta four years ago. During the start of the pandemic I dis some covalent-docking of compounds with PyRosetta for the Covid Moonshoot project, which evolved into Fragmenstein. Most tools have a hard time with crosslinks. And last month the Twittersphere was abuzz with the news of lysine-hydroxylcysteine (N-O-S) bridges in protein.

PyMOL will strip LINK entries from PDBs on saving while NGL obeys only CONECT entries in PBDs. An exception is PyRosetta: it behaves very nicely with disulfides, isopeptide bonds ( cf. repo of PyRosetta code from Keeble et al.) and other crosslinks —mostly. As a result I thought I'd add a note on how to add them in PyRosetta.

Tuesday, 11 January 2022

ggplot colours in Python

In my post about multiple poses in NGLView I mention using ggplot colours in Python, here I revisit it in a bit more detail. Warning: Contains colour theory which may befuddle some viewers.

Sunday, 31 October 2021

Multiple sequence alignments

A sequence alignment is a rather important tool.
  • Sequence conservation is a key ingredient in most nucleotide mutation severity predictors.
  • The covariance within it powers the AlphaFold2 Evoformer and other de novo structure predictors.
  • The phylogeny extracted from it tells the evolutionary tale of the protein
However, on the very basic level, i.e. getting a nice figure, far from the world of covariance matrices, it is a slight nuisance.
Therefore I would like share some pointers on choosing species and two python operation, namely getting the equivalent residue in a homologue and making a figure in Plotly. Just like with docking, where careful and diligent human choices make all the difference, rational choices help greatly with clarity for sequence alignments.