Sunday 15 December 2019

What-if: Biosynthesis of deazaguanine

7-deazaguanine is an analogue of guanine that has a carbon instead of a nitrogen. This molecule that is not made in nature, but many substituted versions, namely queine, archaeosine nucleobase and their precursors. These are made via a different route. However, it would be very feasible to make via straightforward enzymology.

Saturday 30 November 2019

Convert Python docstrings to GitHub markdown readmes

This is suited for very small projects were read-the-docs is way overkill, however for most project that will be more suited — see how-to post on setting Sphinx for RTD for more
The Greek philosopher Epictetus said that a day reverse engineering a piece of code saves you half an hour reading the documentation. A maxim still valid to this day. Nevertheless, documenting code is important. With PyCharm and the push towards typehinting in Python writing docstrings is fairly simple. However, getting docstrings into the of GitHub is not straightforward the first time round. Hence, I wrote this simple guide to doing so.
Do note that for medium/big projects, using ReadTheDocs is recommended over this hack: ee the guide to setting up Sphinx for ReadTheDocs

Thursday 7 November 2019

Go away glycerol!!

Due to the nature of crystallisation additives are often found in PDB structures. These are generally unwelcome, especially if you want to extract ligands. In fact, I have heard only once someone talk excitedly about their crystallisation reagent in their structure, but only because they were trying to flog it off as an allosteric binding site. Generally, they are just annoying. Luckily you don't need reinvent the wheel as a list or two already exist!

Monday 21 October 2019

RDKit for Rosetta: PLP ligand space as an example

Docking requires a molecule to dock. Preparing a ligand is often tricky, especially if the ligand is complicated, such as PLP. PLP is an interesting cofactor as it catalyses the reaction while the protein chooses the ligand. It binds tightly to the active site via its phosphate and its pyridine ring, while the metabolite to be transformed forms a Schiff base with it. Therefore, one would think that it makes easy to explore chemistry space with it. However, several technical hurdles are encountered, making it quite didactic.

Toasty CSS with BS4

In Bootstrap 4 you can have appear small alert-like rectangles, called toasts. However, getting these to work like notifications on top of the page in the top right is not trivial as it requires some CSS trickery. Here is what is required.

Saturday 12 October 2019

Pictograms with Plotly and FontAwesome

Plotly is one of the most powerful graphing packages for Python, JS and Julia. The cool feature is that the graphs are HTML bases with interactive graphs as opposed to a static jpg. There are several graphs that are missing, one of which is a pictogram. It's not a very silly graph, but  Luckily a pictogram is easy-ish to make.

Wednesday 4 September 2019

PDB numbering rollercoaster

The position in a crystal structure and the protein sequence rarely match. In fact, there are four parts of start-end:
  • position in whole protein,
  • position in extracted sequence,
  • position in residues stated in the PDB/mmCIF structure and 
  • position which actually has coordinates.

Thursday 8 August 2019

Jupyter notebook progressbar

I have this rather handy wee piece of code I'd like to share: a Jupyter notebook Progress bar.

Saturday 3 August 2019

When will the PDB run out of 4-letter codes?

The PDB ids are really nice and short: 4 letter codes. But when will all the combinations run out? Actually, not for a long long time.
The current total is 155,618 structures and new ones are added at a rate of 12000 structures per year, which means that, assuming a constant growth, in 125 years —(36 ^ 4 - 155,618 ) /  12,000 —the PDB will finish codes to allocate.
2145. That is a few years after the setting of Kim Robinson's New York 2140, where New York is a flooded super-Venice, so I am guessing the RCSB PDB, in San Diego, will have long been flooded so lack of 4-letter codes is not top of their concerns.

Tuesday 2 July 2019

Wikipedia datamining

There are several online sites that can be data-mined to reveal really nice trends, top-10s and topdown summaries. Twitter is the archetype site for this, thanks to hashtags making an easy job for anyone wanting to investigate trends. I prefer Reddit for datamining specific trends as it powered by folk having arguments on topics they are passionate about as opposed to ideas of celebrities, corporate spokespeople and ФСБ agents. eBay is also fun as it reveals what people are willing to pay for things. But the best source of data, even for other datasets, is Wikipedia. Not only to read up on things, but also to get data for things within a given "category".

Friday 28 June 2019

Exporting Jupyter notebooks with Plotly graphs

If it is a small project or analysis, I opt for Jupyter notebook rather than an IDE such as PyCharm, which is great for large projects, but not such much for a small analyse as go project. Plotly is my goto for graphs —I proselytise about it. The advantage is that it is a wrapper for a JS library which allows interactive. However, in my system at least, using the plotly.offline.iplot plotter, when I export it as a HTML an error is thrown due to require not being set up correction. This is easily fixed.

Friday 31 May 2019

A note on the Linux PyMOL C01 atom oddity

This weird bug has been haunting me for ages. The PyMOL 1.8 (not 2 in Win or Mac) and Linux PyMOL 2 builder creates residues with a Cα called C01 as opposed to CA. If any operation is done to these (e.g. Rosetta Relax), they will be discarded during the reading of the file. That is, they will not be fixed and worse if Rosetta Remodel is used, it will assume that the residue never existed, because Remodel does not understand PDB numbering annoyingly. Simply substituting all 'C01' to 'CA' fixes the problem.

Thursday 16 May 2019

The secondary metabolism of pineberry strawberries

For an upcoming open-day we will extract DNA from strawberries. For this I made a slide that explains how DNA mutations lead to protein variants, than in turn lead to different phenotypes (redness in the strawberry's case). In doing this, I got fascinated by a strawberry cultivar called "Pineberry". But not because it is unpigmented, but because the reviews online say it is bland, which means that a rather early enzyme is missing resulting in a unpigmented phenotype and a bland phenotype.

Sunday 24 March 2019

An arrow between Bootstrap cards

Recently I wanted to add an arrow (as in the triangle at the side of a tooltip or popover) pointing one card to its neighbour. It is only a few lines of code, but oddly the solutions available online are overly complex and wasteful. So this is my barebone solution.

Tuesday 19 February 2019

Uniprot XML and Python ElementTree

Biopython does not have support for Uniprot. The reason is because it holds so much data that it would defeat the point to introduce a complex standard that the user would have to try and remember and the best way is for the user to choose themselves what piece of data they want.
Here I discuss the best way to deal with Uniprot XML files using ElementTree, which is really nice, but awkward at times, hence why I talk about a few monkeypatches that help. If you do not wish to deal with ElementTree (say you want a really quick, bu messy fix) see my post about complicated dictionaries.

Tuesday 5 February 2019

PDB in Office 365

Today I noticed that Word and the rest of Office 365 (i.e. Powerpoint included) can read .obj files (Wavefront files).
This means that I can export in PyMol (via the command "save whatever.obj") a .pse/.pdb to .obj file and open it in Word or Powerpoint or even Outlook. Quite fun and potentially useful little feature. Do note that colours are obviously lost, so it is a bit limiting. The loss of colour is due to PyMOL (Blender exported Wavefront files are textured). Also, only the cartoon of every residue is given, not only visible, and all sticks etc. are lost.

Wednesday 30 January 2019

Tuesday 15 January 2019

Phosphorylated PDB files

Sometime in human protein, a residue is phosphorylated, yet the model one gets from I-TASSER, Phyre etc. or the actual PDB structure lacks these. Here is how to add them easily and quickly with Rosetta.