PCR distribution

Saturday, 25 August 2018

PCR distribution

The distribution of mutations in an error prone library has been modelled in Sun (1995) based on the underlying principle of a PCR reaction. In the paper it is found that the distribution becomes more and more similar to a regular Possion and the two are virtually the same after 10 cycles. However, given that one most likely does not know what one's PCR efficiency is, using this formula may be dangerous. Whereas in reality getting a better estimate of the mean number of mutations may be more helpful.


in my fletting spare time,  I am currently working on a server (speak peak: here) that does a variety of calculations for error-prone mutagenesis libraries. Some tools are updates of the ones written my my PhD supervisor, Wayne Patrick, and his collaborator, Andrew Firth.
For the Pedel calculations in the latter's server, they allow the use of a PCR distribution from Sun (1995). This implementation follows another paper, Drummond et al (2005), where this formula was implemented.
The PCR distribution is rather complex and relies on PCR efficiency and number of productive cycles:

The PCR distribution differs substantially from a Poisson when the mutation number is very large, the number of cycles is small or the PCR efficiency is low. Unless one is using 8oxo-dGTP and dPTP a such a high mutation rate per cycle is not possible and even then the yields are so poor that the product needs amplifying again with regular PCR, which would murk the water up, therefore making a Poisson the safest bet anway.
Whereas in a normal PCR run the difference is less than 5%.


To further complicate things is the fact that the number of cycles set on the thermocycler (say 32) is not the same as the number of productive PCR cycles. In fact, in most cases setting the number of cycles on the thermocycler to 20 or 100 would make little difference (a bit more smeary in the latter).
To get the number of productive cycles one can either do a qPCR or do a gradient of cycles and see when the reaction plateaus. Neither makes much sense in order to get a value for mutagenesis, especially when using commercially bought Mutazyme II, which costs £30 per reaction.
Assuming the PCR did not get to completion in 32 cycles for a Q5 reaction is comical, but in the case of error-prone PCRs is might be the case as all of which have various degrees of low yields.
Even then, as pointed out by A. Firth, in Drummond et al. (2005) the formula to calculate the PCR efficiency is  ncycles = d / eff, where d is the number of doublings, which is actually wrong. The correct formula is eff = 2(d/ncycles) - 1.

Mutational load

One parameter in the mix, for both Poisson distribution and PCR distribution, is mutational load (m).
To calculate this, one sends a dozen individual colonies to be sequenced, counts the variants and then one does mean of the values. However, even with a dozen (12 × £2.95 = £35.40) one gets a high error as shown in the figure below.
A better approach is to use all the data at hand and fit to a Poisson distribution.

Shameful self-advertising

Luckily, my server mutanalyst, does that for you and gives you various metrics of how confident the fit is. Mutanalyst is also available in the new server (here).
Also, if you want save money on sequencing, Mutantcaller, also in the new server can call heterologous sites in ab1 traces, allowing you to pool 2–3 grow-ups and send a single tube for sequencing.


  • Drummond D.A., Iverson B.L., Georgiou G., Arnold F.H. (2005). Why high-error-rate random mutagenesis libraries are enriched in functional and improved proteins, J. Mol. Biol.350, 806-816.
  • Sun F. (1995). The polymerase chain reaction and branching processes, J. Comput. Biol.2, 63-86

No comments:

Post a Comment