## Saturday 25 August 2018

### PCR distribution

The distribution of mutations in an error prone library has been modelled in Sun (1995) based on the underlying principle of a PCR reaction. In the paper it is found that the distribution becomes more and more similar to a regular Possion and the two are virtually the same after 10 cycles. However, given that one most likely does not know what one's PCR efficiency is, using this formula may be dangerous. Whereas in reality getting a better estimate of the mean number of mutations may be more helpful.

### Distribution

in my fletting spare time,  I am currently working on a server (speak peak: here) that does a variety of calculations for error-prone mutagenesis libraries. Some tools are updates of the ones written my my PhD supervisor, Wayne Patrick, and his collaborator, Andrew Firth.
For the Pedel calculations in the latter's server, they allow the use of a PCR distribution from Sun (1995). This implementation follows another paper, Drummond et al (2005), where this formula was implemented.
The PCR distribution is rather complex and relies on PCR efficiency and number of productive cycles:

The PCR distribution differs substantially from a Poisson when the mutation number is very large, the number of cycles is small or the PCR efficiency is low. Unless one is using 8oxo-dGTP and dPTP a such a high mutation rate per cycle is not possible and even then the yields are so poor that the product needs amplifying again with regular PCR, which would murk the water up, therefore making a Poisson the safest bet anway.
Whereas in a normal PCR run the difference is less than 5%.

### Efficiency

To further complicate things is the fact that the number of cycles set on the thermocycler (say 32) is not the same as the number of productive PCR cycles. In fact, in most cases setting the number of cycles on the thermocycler to 20 or 100 would make little difference (a bit more smeary in the latter).
To get the number of productive cycles one can either do a qPCR or do a gradient of cycles and see when the reaction plateaus. Neither makes much sense in order to get a value for mutagenesis, especially when using commercially bought Mutazyme II, which costs £30 per reaction.
Assuming the PCR did not get to completion in 32 cycles for a Q5 reaction is comical, but in the case of error-prone PCRs is might be the case as all of which have various degrees of low yields.
Even then, as pointed out by A. Firth, in Drummond et al. (2005) the formula to calculate the PCR efficiency is  ncycles = d / eff, where d is the number of doublings, which is actually wrong. The correct formula is eff = 2(d/ncycles) - 1.

### PCR plateau

In other words, the efficiency calculations for the Sun distribution assumes that a PCR with 100 cycles will produce kilograms of product, which is not the case.
As any qPCR trace can show the number of productive cycles is only about 10-15. Afterwhich, the PCR gets inhibited by the free phosphates and the yield hits a hard limit. Specifically, PCR yields are consistent for a given enzyme kit regardless of amplicon. Although several facts affect this, such as PCR purification kit contaminants, pipetting errors and unwanted products (extra bands, speariness or primer dimer).

One parameter in the mix, for both Poisson distribution and PCR distribution, is mutational load (m).
To calculate this, one sends a dozen individual colonies to be sequenced, counts the variants and then one does mean of the values. However, even with a dozen (12 × £2.95 = £35.40) one gets a high error as shown in the figure below.
A better approach is to use all the data at hand and fit to a Poisson distribution.