On error rates of Mutazyme

## Wednesday, 30 November 2016

### On error rates of Mutazyme

On a previous jocular post about the 'not for diagnostic purposes' tag on Mutazyme I mention the mutational rate of Mutazyme II. The exact mutations per kb per doubling is never mentioned in the manual, but can be extrapolated. (see also part III: Mutazyme and Manganese)

The mutational load is calculated empirically from sampled colonies from the naïve library (using www.mutanalyst.com of course). This value is a result of the number of doublings and the infidelity of the enzyme (error rate). Namely, for every duplication of a strand of DNA there is a chance of errors occurring, which is the error rate, the mutation frequency per kb per doubling, so the more duplications happen the more mutations happen. Therefore, the final aDNA pool is the product of several doublings. In a simplistic model the DNA is amplified with 100% efficiency and the PCR stops suddenly when all the primers are depleted.
The error rate is never explicitly discussed in the manual, hence this post to explain why. The manual features a table (Table 1) where values are worked out and from there one can work backwards to find the mutation frequency per doubling per kb is 1.6 if one guesses how much final DNA one gets. The values from the table come from Figure 1:
In this figure two things strike one's attention: the distribution is heteroscedastic (variance varies based on axis) and the intercept is not the origin (at zero doublings 2.5 mutations appear).

So let's play with this dataset. Matlab Exchange has a great tool called Grabit, which allows you to import and parse jpg figures.
Using the fitting toolbox I get:
``````
Linear model Poly1:
f(x) = p1*x + p2
Coefficients (with 95% confidence bounds):
p1 =      0.9803  (0.5982, 1.362)
p2 =       2.517  (0.01866, 5.016)

Goodness of fit:
SSE: 55.31
R-square: 0.7435
RMSE: 2.242
``````

While trying different lesser alternatives:
``````
>> x\y
ans =
1.3023
>> mean(x.\y)
ans =
1.6600
``````

Specifically, an error rate of 1.3 is that if the intercept is the origin, while 1.0 is if it isn't. So what is going on with this intercept? And what about the heteroscedasticity: is it bad sampling or an intrinsic factor?
Sampling becomes harder for lower mutation rates because more sequences are needed, so that is likely the factor.
However, it is not so simple. The final pool of aDNA is not all of the same generation even when the reaction is 100% efficient: in such a scenario 50% of the strands will have undergone n doublings, 25% n–1 doublings, 12% n–2 doublings etc. So in effect the average age of pool is not n doublings but about n–0.6. This would mean that the error rate is off slightly... but it would shift it in the other direction —doh! Lastly, if the heteroduplex hypothesis in my other post is correct, the error rate would be an underestimate by 50%. If the PCR is not 100% efficient as is actually the case, the values are more diverse still. In fact, I do wonder how does a supercoiled template behave as a template. It ought to denature poorly, so actually a polymerase might be more likely to copy an amplicon than the original. To test this one could see the difference between an amplicon as a template and a plasmid as a template. So the values are not perfect—linear PCR with NGS would the best way to determine the mutational rate, but who has the time and money to test a minor thing like that?
Furthermore, to see the primer efficiency QPCR is required and I don't think anyone would go to the effort of setting up a QPCR to calculate their epPCR rate.
Nevertheless, the error rate is somewhere between 0.9 and 1.3 and that is a somewhat safe bet. ...As is the fact that if one goes for more than 10 doublings, indels dominate the pool ruining the experiment.

The error rate is actually low, which is problematic. Mutazyme I, if one reads the patents, is nothing more than exo- Pfu DNA polymerase with polymerase enhancing factor (PEF) to make it work better (not stall if dUTP is present). Mutazyme II is unknown, but my guess is Thermococcus sp JDF-3 polymerase they mention in the paper. The MgCl2 is at 2 mM, but the primers and dNTPs are at a higher concentration, similarly to MnCl2 mutagenesis. Curiously, they do add 0.5 mM MnCl2 to one experiment to show the two could be combined. But if one wanted a high mutation frequency (e.g. small template) exo- Taq with MnCl2 would be the best option despite the bias issue.