Wednesday, 30 November 2016

Saturday, 12 November 2016

The heteroduplicity of error prone PCR plasmids

A mix of wt and
In an error prone PCR the ep-aDNA is ligated onto a plasmid backbone and transformed. When assessing the diversity from a naïve plasmid pool, something odd is seen: some bases are mutated but not to saturation. This is often just dismissed or simply overlooked, but I suspect it is actually something interesting...

Thursday, 10 November 2016

DNA gyrase for better yields

Transformation efficiency is a key part of library making... Which gets a bit tricky with large plasmids. A forgotten 90s paper would appear to have a solution, if it were not for a catch.

Saturday, 3 September 2016

JW numbers in the Keio

Hirotada Mori is one of the top five names* in E. coli genomics as his group built the Keio and the ASKA collections. Yet the strains from the Keio and ASKA do not start with HM, but JW. Here is why...

Tuesday, 23 August 2016

Methodological sabotage of growth rates

Following the interest in a previous post about analysing growth curves in Matlab I would like to discuss issues in growth curves that can arise from the methodological/biological side of things. Fitting the data is perfect if the data is perfect, if not, looking at what is wrong by eye is warranted for future corrections.

Growth curves can be divided into phases (lag, exponential, stationary and death) and each has its pitfalls.

Saturday, 20 August 2016

Wild about E. coli

Wild type E. coli is a funny concept, because there are actually multiple contestants for the title...

Tuesday, 9 August 2016

Cysteine racemase: an impossible enzyme?

Cysteine racemase is an enzyme (EC that was characterised in lysates long ago, but have never been found since. Is it a real enzyme? Can the reaction happen well? The problem is that racemising via a carbanion intermediate something with a leaving group is not an easy feat.

Friday, 27 May 2016

The witchcraft of knockouts

Making knockouts has a bad rep, but there seems to be a conspiracy afoot to make it feel like a mystic art.

Tuesday, 17 May 2016

Restriction cloning nostalgia

Today I was reminded of an invaluable table that was hung on the wall of every lab: the NEB buffer compatibility chart.

My favorite buffer is buffer 4.

Friday, 13 May 2016

Gene symbol poetry

A rather angry pathway.
"Regulatory protein for 2-phenylethylamine catabolism · two component sensor histidine kinase,essential for acid-tolerance · nikkomycin biosynthesis protein · galacturonide ABC transporter ATPase".
That is my first (and last) attempt at genetic constrained writing: the genes encoding those protein are feaR actS sanS togA, which is a rubbish sentence that makes somewhat sense.
Constrained writing () is an artistic challenge where one writes with a restricted dictionary, for example, there are no word with the letter e in the book Gadsby. Here I restricted my dictionary to words that are also gene symbols.

Thursday, 7 April 2016

Research vs. glitchy data munging

I apologise, but I could not resist this rant...
I am a big advocate for big data —also it's my job—, however one trend I find disturbing is the frequency of attempts to make automated pipelines to predict how one could make a given compound: in a large amount of cases, doing some reading up is way more efficient. Not to mention, without bugs.

Saturday, 2 April 2016

Going solo on a paper

This month my (first) solo paper comes out, which was a big deal as it is, well, my first solo paper.
The short communications describes the web app Mutanalyst: a good/amazing/best/super online tool to help calculate mutational bias spectra especially with poor sampling —in case you were wondering what does shameless Search Engine Optimisation look like, there goes an example. The topic is straightforward as it describes a web program I wrote with a twist, namely it calculate the standard errors, which are dismal when sampling is limited. The weird part is submitting a paper without backing, hence the account of my saga here.

Wednesday, 23 March 2016

Growth curves

> All scripts mentioned can be found on
> Another post discusses what methodological errors can ruins growth rates

Bacterial growth curves never look like they are in the textbooks as sick bacteria are the worst patients.
Fitness is possibly one of the top three buzzwords in evolutionary biology. It encompasses the more subtle effects that lie between death or growth. However, measuring fitness via growth rates is easier say than done. Especially since auxotrophs and bradytrophs have quirkier behaviours than prototrophs.

Thursday, 25 February 2016

Medical diagnosis with Mutazyme

A cornerstone enzyme for error-prone PCR is Mutazyme, an enzyme with an increase error rate, but less biased than manganese mutagenesis. The manual is very clear and the only major annoyance is that it implicitly says that it makes 1.3 mutations per kb per cycle —that is the log2 of the fold amplification of the target—, whereas it actually makes something around 0.9 mutations per kb per cycle —even with the assumption that no DNA is lost during spin column purification or that DNA cut out of an agarose gel is not shockingly dirty.
However, the biggest mystery is that it says "Not for medical diagnostics".

Monday, 15 February 2016

Biochemical reaction yield and enzyme promiscuity

Reaction yield, i.e. the molar percentage of product over substrate, is often mentioned by chemists, but never by biochemists. My guess is that many enzymes are not perfectly efficient, but have a range of reaction yields.
In The hitchhiker's guide to the galaxy a ship is hidden thanks to the "somebody else's problem" principle, namely people will ignore something problematic that isn't their problem. The reaction yield of enzymes is not something often discussed. The reason is pretty self evident: differentiating between low abundance products would be a minefield of pesky technical issues. So it is somebody else's problem.

Saturday, 6 February 2016

Promiscuously hitchhiking on a pathway

In the seminal 1976 paper on enzyme evolution Roy Jensen first pointed out that the TCA cycle, the ketoadipate route to lysine, pantothenate, isoleucine, valine and leucine biosynthesis all operated via the same mechanistic steps (condensation with an acyl-CoA, rearrangement, oxidation and elimination of a carbon) and conjectured that they descent from a common primordial pathway.
Promiscuity is generally studied with a single enzyme as a model. A few paper tip-toe around it, but I am not sure that there are any that deal specifically with a pathway where the each enzyme along a pathway shows substrate ambiguity towards the promiscuous product of the previous reaction. I mentioned in another post that the branched chain amino acid pathway can produce norvaline, norleucine and homonorleucine when certain enzymes are overexpressed. Each enzyme in the pathway shows substrate ambiguity, so the whole pathway possess substrate ambiguity.
It is not a feature of the oxaloacetate-to-ketoglutarate–like pathways, but can be found in other pathways.

Sunday, 17 January 2016

Uncultured bacterial majority? Digitally unannotated majority of the minority is worse

Aquifex, an exiting bacterium, which unexciting BioSample data.
It has been remarked that the majority of bacterial diversity remains uncultured. A fraction of the cultured bacteria are genome-sequenced and a fraction of these have an machine-readable data about them.

Saturday, 16 January 2016

The contagious ORF annotation error of 16S rRNA

Some time back in many genomes there were a few copies of a small hypothetical open reading frame, sometimes annotated as a quinone oxidase. These organisms also had less 16S rRNA than 23S rRNA. This is not some curious observation about enzyme evolution of a duo of a promiscuous ribozymatic activity of 23S rRNA and small protein that could lead to a Nature paper, though. In reality it is a sequence annotation error that seems quite viral in NCBI.

Getting the corresponding nucleotide sequences of protein sequences

Getting stuck on a really simple task is not a nice feeling.
A seemingly simple challenge I have faced a few times so far is getting the nucleotide sequence corresponding to a protein sequence: this seems really straightforward, yet it is not.