Skewed mutational biases: English scrabble with Klingon tiles

Saturday 6 January 2018

Skewed mutational biases: English scrabble with Klingon tiles

Skewed mutational biases: English scrabble with Klingon tiles

Random set of 7 TlhIngan Hol tiles
In a game of Scrabble one draws seven tiles from a bag and the best results are found when a bag and the language used match. In library making with error-prone PCR one does not have that luxury, but different methods differing degrees of fairness. Mutazyme is like playing in English with a Latin tile set, while manganese Taq with equal nucleotides is like playing in English with a Klingon tile set.

What one can wish is an equal distribution of mutations. The genetic code is laid out in odd fashion. There are some trends, but very weak and strong exceptions can be easily found, say glycine to proline requires two mutations, but trends are present nevertheless.
In most codon blocks the last nucleotide in a codon is synonymous, especially if transitions. The second base in a codon has a trend, so mutations to the first will give a similar property. With some exceptions, the mutations sought are generally subtle —isoleucines to leucines, serines to threonines and so forth—, therefore changes to the first nucleotide are the ones of interest really, while mutations to the second will be most likely be lethal and the third neutral. With the first nucleotide variants there is no trend though, say "all G↔A mutations are subtle". So a homogeneous blend is ideal. The catch is that this is not trivial to achieve.

The missing transversion

The presence of biases in the mutational spectrum in error-prone libraries is well known. The predilection of Taq + manganese towards T/A→N is the most often cited example. But even Mutazyme is far from blemishless. What they all have in common is an extreme rarity of G↔C transversions —2% for Mutazyme and somewhere on the lower side of 1% for manganese.
Jena bioscience sells some really nice nucleotide analogues for mutagenesis, but none allow G↔C transversions.

So without these what do we lose in the "subtle" category? That is mutations that are subtle and often beneficial.

  • Glutamine (CAR) ↔ glutamate (GAR) —great for changing the surface charge
  • Alanine (GCN) ↔ glycine (GGN) —great for altering the rigidity of loops
  • Alanine (GCN) ↔ proline (CCN) —good for altering the rigidity of loops
  • Leucine (CTN) ↔ valine (GTN) —great for altering the shape of the protein core
  • Threonine (ACY, not ACR) ↔ serine (AGY, but not TCN block) —great for small shifts

This is pretty serious and there is no clean way to get a good G↔C transversions. What this shows is that some mutations are really rare and one should keep an eye out.

PS. If there is a mutation of this kind one was expecting to test, it would be best to spike it with a dilute mutagenic oligo during the epPCR. In fact, my epPCR often have in addition to the amplification primers have a dozen or so mutagenic forward oligos at 1/100 the concentration to mutate within the gene in order to make sure certain.

PS #2. Actually, to make life easier, I have made in the upcoming release of Pedel server a page to calculate what are the rough odd of having tested a given mutation in the library:
Additionally, there is a page to quickly design mutagenic primer:


  1. Yes. of course. In some languages, like French or Italian, you can simply ignore the accents. In others, like Polish and Slovak, the accented letters are actually part of their alphabet and have separate tiles.

  2. It's intriguing to juxtapose the world of Scrabble and its anagrams with the complexities of genetic mutations, providing a tangible analogy for many to grasp. Your analogy of playing in English with a Klingon tile set truly encapsulates the challenges faced with manganese Taq. The lack of G↔C transversions is particularly eye-opening. Given their significance, especially for subtle and potentially beneficial mutations, it's a stark reminder of the nuances and gaps in our current understanding and methodology