The art of blowing up protein

Monday, 21 October 2019

RDKit for Rosetta: PLP ligand space as an example

Docking requires a molecule to dock. Preparing a ligand is often tricky, especially if the ligand is complicated, such as PLP. PLP is an interesting cofactor as it catalyses the reaction while the protein chooses the ligand. It binds tightly to the active site via its phosphate and its pyridine ring, while the metabolite to be transformed forms a Schiff base with it. Therefore, one would think that it makes easy to explore chemistry space with it. However, several technical hurdles are encountered, making it quite didactic.

In Bootstrap 4 you can have appear small alert-like rectangles, called toasts. However, getting these to work like notifications on top of the page in the top right is not trivial as it requires some CSS trickery. Here is what is required.

Pictograms with Plotly and FontAwesome

Plotly is one of the most powerful graphing packages for Python, JS and Julia. The cool feature is that the graphs are HTML bases with interactive graphs as opposed to a static jpg. There are several graphs that are missing, one of which is a pictogram. It's not a very silly graph, but Luckily a pictogram is easy-ish to make.

PDB numbering rollercoaster

The position in a crystal structure and the protein sequence rarely match. In fact, there are four parts of start-end:

position in whole protein,
position in extracted sequence,
position in residues stated in the PDB/mmCIF structure and
position which actually has coordinates.

Jupyter notebook progressbar

I have this rather handy wee piece of code I'd like to share: a Jupyter notebook Progress bar.

When will the PDB run out of 4-letter codes?

The PDB ids are really nice and short: 4 letter codes. But when will all the combinations run out? Actually, not for a long long time.
The current total is 155,618 structures and new ones are added at a rate of 12000 structures per year, which means that, assuming a constant growth, in 125 years —(36 ^ 4 - 155,618 ) / 12,000 —the PDB will finish codes to allocate.
2145. That is a few years after the setting of Kim Robinson's New York 2140, where New York is a flooded super-Venice, so I am guessing the RCSB PDB, in San Diego, will have long been flooded so lack of 4-letter codes is not top of their concerns.

Wikipedia datamining

There are several online sites that can be data-mined to reveal really nice trends, top-10s and topdown summaries. Twitter is the archetype site for this, thanks to hashtags making an easy job for anyone wanting to investigate trends. I prefer Reddit for datamining specific trends as it powered by folk having arguments on topics they are passionate about as opposed to ideas of celebrities, corporate spokespeople and ФСБ agents. eBay is also fun as it reveals what people are willing to pay for things. But the best source of data, even for other datasets, is Wikipedia. Not only to read up on things, but also to get data for things within a given "category".

The art of blowing up protein

Pages

Monday, 21 October 2019

RDKit for Rosetta: PLP ligand space as an example

Toasty CSS with BS4

Saturday, 12 October 2019

Pictograms with Plotly and FontAwesome

Wednesday, 4 September 2019

PDB numbering rollercoaster

Thursday, 8 August 2019

Jupyter notebook progressbar

Saturday, 3 August 2019

When will the PDB run out of 4-letter codes?

Tuesday, 2 July 2019

Wikipedia datamining

About Me