Tuesday, 2 July 2019

Wikipedia datamining

There are several online sites that can be data-mined to reveal really nice trends, top-10s and topdown summaries. Twitter is the archetype site for this, thanks to hashtags making an easy job for anyone wanting to investigate trends. I prefer Reddit for datamining specific trends as it powered by folk having arguments on topics they are passionate about as opposed to ideas of celebrities, corporate spokespeople and ФСБ agents. eBay is also fun as it reveals what people are willing to pay for things. But the best source of data, even for other datasets, is Wikipedia. Not only to read up on things, but also to get data for things within a given "category".

Friday, 28 June 2019

Exporting Jupyter notebooks with Plotly graphs

If it is a small project or analysis, I opt for Jupyter notebook rather than an IDE such as PyCharm, which is great for large projects, but not such much for a small analyse as go project. Plotly is my goto for graphs —I proselytise about it. The advantage is that it is a wrapper for a JS library which allows interactive. However, in my system at least, using the plotly.offline.iplot plotter, when I export it as a HTML an error is thrown due to require not being set up correction. This is easily fixed.

Friday, 31 May 2019

A note on the PyMOL1.8 C01 atom oddity

This weird bug has been haunting me for ages. The PyMOl 1.8 (not 2) builder creates residues with a Cα called C01 as opposed to CA. If any operation is done to these (e.g. Rosetta Relax), they will be discarded during the reading of the file. That is, they will not be fixed and worse if Rosetta Remodel is used, it will assume that the residue never existed, because Remodel does not understand PDB numbering annoyingly. Simply substituting all 'C01' to 'CA' fixes the problem.

Thursday, 16 May 2019

The secondary metabolism of pineberry strawberries

For an upcoming open-day we will extract DNA from strawberries. For this I made a slide that explains how DNA mutations lead to protein variants, than in turn lead to different phenotypes (redness in the strawberry's case). In doing this, I got fascinated by a strawberry cultivar called "Pineberry". But not because it is unpigmented, but because the reviews online say it is bland, which means that a rather early enzyme is missing resulting in a unpigmented phenotype and a bland phenotype.

Sunday, 24 March 2019

An arrow between Bootstrap cards

Recently I wanted to add an arrow (as in the triangle at the side of a tooltip or popover) pointing one card to its neighbour. It is only a few lines of code, but oddly the solutions available online are overly complex and wasteful. So this is my barebone solution.

Tuesday, 19 February 2019

Uniprot XML and Python ElementTree

Biopython does not have support for Uniprot. The reason is because it holds so much data that it would defeat the point to introduce a complex standard that the user would have to try and remember and the best way is for the user to choose themselves what piece of data they want.
Here I discuss the best way to deal with Uniprot XML files using ElementTree, which is really nice, but awkward at times, hence why I talk about a few monkeypatches that help. If you do not wish to deal with ElementTree (say you want a really quick, bu messy fix) see my post about complicated dictionaries.

Tuesday, 5 February 2019

PDB in Office 365

Today I noticed that Word and the rest of Office 365 (i.e. Powerpoint included) can read .obj files (Wavefront files).
This means that I can export in PyMol (via the command "save whatever.obj") a .pse/.pdb to .obj file and open it in Word or Powerpoint or even Outlook. Quite fun and potentially useful little feature. Do note that colours are obviously lost, so it is a bit limiting. The loss of colour is due to PyMOL (Blender exported Wavefront files are textured). Also, only the cartoon of every residue is given, not only visible, and all sticks etc. are lost.