A cornerstone enzyme for error-prone PCR is Mutazyme, an enzyme with an increase error rate, but less biased than manganese mutagenesis. The manual is very clear and the only major annoyance is that it implicitly says that it makes 1.3 mutations per kb per cycle —that is the log2 of the fold amplification of the target—, whereas it actually makes something around 0.9 mutations per kb per cycle —even with the assumption that no DNA is lost during spin column purification or that DNA cut out of an agarose gel is not shockingly dirty.
However, the biggest mystery is that it says "Not for medical diagnostics".
A segfault and NaN driven series of disconnected ideas, analyses and just plain silly posts about computational biochemistry, synthetic biology and microbiology.
Thursday, 25 February 2016
Monday, 15 February 2016
Biochemical reaction yield and enzyme promiscuity
Reaction yield, i.e. the molar percentage of product over substrate, is often mentioned by chemists, but never by biochemists. My guess is that many enzymes are not perfectly efficient, but have a range of reaction yields.
In The hitchhiker's guide to the galaxy a ship is hidden thanks to the "somebody else's problem" principle, namely people will ignore something problematic that isn't their problem. The reaction yield of enzymes is not something often discussed. The reason is pretty self evident: differentiating between low abundance products would be a minefield of pesky technical issues. So it is somebody else's problem.
In The hitchhiker's guide to the galaxy a ship is hidden thanks to the "somebody else's problem" principle, namely people will ignore something problematic that isn't their problem. The reaction yield of enzymes is not something often discussed. The reason is pretty self evident: differentiating between low abundance products would be a minefield of pesky technical issues. So it is somebody else's problem.
Saturday, 6 February 2016
Promiscuously hitchhiking on a pathway
In the seminal 1976 paper on enzyme evolution Roy Jensen first pointed out that the TCA cycle, the ketoadipate route to lysine, pantothenate, isoleucine, valine and leucine biosynthesis all operated via the same mechanistic steps (condensation with an acyl-CoA, rearrangement, oxidation and elimination of a carbon) and conjectured that they descent from a common primordial pathway.
Promiscuity is generally studied with a single enzyme as a model. A few paper tip-toe around it, but I am not sure that there are any that deal specifically with a pathway where the each enzyme along a pathway shows substrate ambiguity towards the promiscuous product of the previous reaction. I mentioned in another post that the branched chain amino acid pathway can produce norvaline, norleucine and homonorleucine when certain enzymes are overexpressed. Each enzyme in the pathway shows substrate ambiguity, so the whole pathway possess substrate ambiguity.
It is not a feature of the oxaloacetate-to-ketoglutarate–like pathways, but can be found in other pathways.
Promiscuity is generally studied with a single enzyme as a model. A few paper tip-toe around it, but I am not sure that there are any that deal specifically with a pathway where the each enzyme along a pathway shows substrate ambiguity towards the promiscuous product of the previous reaction. I mentioned in another post that the branched chain amino acid pathway can produce norvaline, norleucine and homonorleucine when certain enzymes are overexpressed. Each enzyme in the pathway shows substrate ambiguity, so the whole pathway possess substrate ambiguity.
It is not a feature of the oxaloacetate-to-ketoglutarate–like pathways, but can be found in other pathways.
Sunday, 17 January 2016
Uncultured bacterial majority? Digitally unannotated majority of the minority is worse
Saturday, 16 January 2016
The contagious ORF annotation error of 16S rRNA
Some time back in many genomes there were a few copies of a small hypothetical open reading frame, sometimes annotated as a quinone oxidase. These organisms also had less 16S rRNA than 23S rRNA. This is not some curious observation about enzyme evolution of a duo of a promiscuous ribozymatic activity of 23S rRNA and small protein that could lead to a Nature paper, though. In reality it is a sequence annotation error that seems quite viral in NCBI.
Saturday, 5 December 2015
The future of enzymology?
EDIT: I called it! Turns out this was much closer to reality than I thought and a paper came out doing exactly this.
Assaying enzymes is rather laborious and, even though the data quality is much higher it does not compete in productivity with the other fields of biochemistry and genetics. So I gave some thought into where I believe enzymology will be in the future and I have come to the conclusion that in vitro assays will be for the most part seriously replaced by in vivo omics methods, but not any time soon as both proteomics and metabolomics need to come along way, along with systems biology modelling algorithms.Uncompetitively laborious
Everyone that has assayed enzymes will tell you that a single table in a paper took years and years of assays. They will tell you horror stories that the enzyme did not express solubly, the substrate took months to make, the detection required a list of coupled enzymes or that the activity was so low that everything had to be meticulously calibrated and assayed individually. Personally, I had to assay Thermotoga maritima MetC at 37°C due to the fact that for one reaction the indicator decomposed at 50°C, while for another activity the mesophilic coupled enzymes would otherwise denature. All while comparing it to homologues from Wolbachia and Pelagibacter ubique, which had to be monitored by kinetic assay —as they melted if you looked at them— and individually as the Wolbachia MetC had a turnover of 0.01 s-1 (vestigial activity; cf. my thesis). And I was lucky as I did not have substrates that were unobtainable, unstable and bore acronyms as names.The data from enzyme assays is really handy, but the question is how will fair after 2020?
The D&D 3.5 expression "linear fighter, quadratic wizard", which encapsulate the problem that with level progression wizards left fighters behind, seems rather apt as systems biology and synthetic biology seem to be just steaming ahead (quadratically) leaving enzymology behind.
Enzymomics?
Crystallography is another biochemical discipline that requires sweat and blood. But with automation, new technologies and a change of focus (top down), it is keeping up with omics world.Enzymology I feel isn't. There is no such field as enzymonics —only a company that sells enzymes, Google informs me.
A genome-wide high-throughput protein expression and then crystallographic screen may work for crystallography, but it would not work for enzymology as each enzyme has its own substrates and the product detection would be a nightmare.
This leads me to a brief parenthesis: the curious case of Biolog plates in microbiology. They are really handy as they are a panel of 96-well plates with different substrates and toxins. These phenotype "micro"arrays are terribly underutilised, because each plate inexplicably costs $50-100. Assuming that someone made "EC array plates" where each well tested an EC reaction a similar or worse problem would arise.
That is fine as a set of EC plates would be impossible to make as to work each well would need a lyophilised thermophilic enzyme that was evolved to generate a detectable change (e.g. NADH or something better still) for a specific product in order to do away with complex chains of coupled enzymes that may interfere with the reaction in question along with the substrate, which is often unstable. Not to mention that EC numbers are rather frayed around the edges, I think the most emphatic example is the fact that the reduction of acetaldehyde to ethanol was the first reaction described (giving us the word enzyme) and has the EC 1.1.1.1, while the reduction of butanal to butanol is EC 1.1.1.– as in, no number at present.
Therefore, screening cannot will with same format as crystallography.
Parallel enzymology
Some enzyme assays are easy, especially for central metabolism. The enzymes are fast, the substrates purchasable and the reaction product detectable. As a result, with the reduction of gene synthesis costs —currently cheaper than buying the host bug from ATCC and way less disappointing than emailing authors— panels of homologues for central enzymes can be tested with ease. There are some papers that are starting to do that and I am sure that more will follow. That is really cool, however, it is the weird enzymes that interests scientist the most.
In silico modelling
Even if it would seem like a straightforward thing, it is currently near impossible to determine in silico the substrate of an enzyme or the kinetic parameters with an enzyme structure and its substrate. The protein structure predictions are poor at best and in silico docking to find the substrates is not always reliable, although a few papers have found the correct substrate starting from crystal structures of the enzymes. Predicting the kinetic parameters requires computationally very heavy quantum-mechanical molecular dynamics simulations and the result would be an approximation at best. What is worse is that all these programs, from Autodock to Gaussian. are challenging to use, not because they present cerebral challenges, but they are simply very buggy. Furthermore, the picture would be only partial.
Deconvoluted in vivo data
Genetic engineering, metabolomics and proteomics might come to the rescue.Currently, metabolomics is more hipster avant-garde than mainstream. The best way to estimate the intracellular concentration of something in the micromolar range is to get the Michaelis constant of the enzyme that uses it —Go enzymology!—. But it is just a matter of time before one can detect even nanomolar compounds arising from spontaneous degradation or promiscuous reactions —"dark metabolome" if you really wanted to coin a word for it and write a paper about it.
Also, currently, flux balance analysis can be constrained with omics data —in order of quality: transcriptomics, proteomics and metabolomics data. If the latter two datasets were decent, systems biology models would need to come a long way before one could estimate from a range of conditions a rough guess of the kinetic parameters of all enzymes in the genome. The current models are not flexible or adaptive: one builds a model and the computer finds the best fitting equation and to do that they require fancy solvers. Then again, the data is lacking there and are not as CPU heavy as phylogeny or MD simulations. Consequently, they are poor benchmarks: if perfect proteomics and metabolics data were available, it would take Matlab milliseconds to find the reaction velocity (and as a consequence the catalytic efficiency) of all the enzymes in the model. Add a second condition (say a different carbon source or a knockout) and, yes, one could get better guestimates, but issues would pop up, like negative catalytic efficiencies. The catch is that some enzymes are inhibited, others are impostors in the model and other unmarked enzymes catalysing those reactions may result in subpar fits. Each enzyme may be inhibited by one or more of more than five thousand protein or small compounds in a variety of fashions and any enzyme may catalyse the same reaction secretly.
The maths would get combinatorially crazy quite quickly, but constrains and weights could be made, such previously obtained kinetic data, the similarity of Michaelis constant and substrate concentration or even extrapolation from known turnover rates for known reactions of that subclass.
Questioning gene annotation would open up a whole new bag of worms as unfortunately genome annotation does not have a standardised "certainty score" —it would be diabolically hard to devise as annotations travel like Chinese whispers—, so every gene would have to be equally likely to be cast in doubt, unless actual empirical data were used. So in essence it would have to be a highly interconnected process, reminiscent of the idealistic vision of systems biology.
Nevertheless, despite the technical challenges it is possible that with a superb heuristic model-adapting algorithm and near-perfect omics profiles under different conditions pretty decent kinetic parameters for all the enzymes in a cell could be done —also giving a list of genes to verify by other means. When such a scenario would be mainstream is anyone's guess, mine is within the next ten to fifteen years.
Saturday, 28 November 2015
ABS biosynthesis
Lego, rumour has it, wants to biosynthesise acrylonitrile, butadiene styrene (ABS), the resin that gives their blocks their firm hold and transgenerational lifespan. This is cool for three reasons:
- metabolic engineering is cool by definition,
- Lego is cool by definition and
- one or two steps link back to a cool gene I found in Geobacillus
Chemistry
So what might they do to biosynthesise their resin? The processes are rather straightforward and one has to go out of one's way to dream up a cool route. In fact, there is a lot of repetition.
The three monomers for the polymerisation are styrene, acrylonitrile and butanediene. These would be made separately. But there are several commonalities, such as the terminal ene group.
There are a few ways to get a terminal ene group:
- Have a 2,3-ene and tautomerise it
- Have a 2,3-ene and terminal carboxyl and eliminate the carboxyl
- Reversible dehydration
- Irreversible dehydration via phopharylated intermediate
- Oxidative decarboxylation (oleT encoded p450-dependent fatty acid decarboxylase from Jeotgalicoccus sp.)
My guess is that their major challenge is that they will have to extensively modify a few enzymes and will be plagued with detection and screening. Nevertheless, I am still going to talk about the chemistry as it is a good excuse to sneak in a cool set of genes from Geobacillus.
Styrene
There are two way to biosynthesise styrene. The simplest is decarboxylating cinnamic acid, while the more interesting one by dehydrating phenylethanol.
The tourist route
Phenylethanol —also unglily called phenylethyl alcohol— is in turn made from phenylacetate, which is made from phenylpyruvate.
Recently, while analysing a transcriptomic dataset for Prof. D. Leak, which resulted in an awesome website, www.geobacillus.com, I stumbled across a really cool enzyme encoded among phenylalanine degradation genes, that I speculate is a phenylpyruvate dehydrogenase. This is a homologue of pyruvate dehydrogenase and follows the same mechanism, namely a decarboxylative oxidation followed by CoA attack.
There are other ways to make phenylacetate, but none allow such a shameless plug for my site —in fact, I should have talked about the 2-phenylethylamine biosynthetic route instead.
In nature the phenylacetate will go down the phenylacetate degradation pathway (paa genes), but it could be forced to go backwards and twice reduce the carboxyl group. Phenylacetaldehyde dehydrogenase is a common enzyme, which even E. coli has (faeB), but the phenylethanol dehydrogenase is not. I found no evidence that anyone has characterised one, but I am fairly certain that Gthg02251 in Geobacillus thermoglucosidasius is one as it is an alcohol dehydrogenase guiltily encoded next to faeB, which in turn is not with phenylethylamine deaminase (tynA).
So, that is how one makes phenylethanol. The dehydration part is problematic. A dehydratase would be reversible, but offers the cool advantage that it can be evolved by selecting for better variants that allow a bug with the paa genes and all these genes to survive on styrene as a carbon source. The alternative is phosphorylation and then dehydration as happens with several irreversible metabolic steps.
The actual route
That is the interesting way of doing it. Whereas the simple way is rather stereotypical. In plants there are really few secondary metabolites that are not derived from polyketides, isoprenoid, cinnamate/cumarate or a combination of these. Cinnamic acid is deaminated phenylalanine via a curious elimination reaction (catalysed by PAL). In the post metabolic engineering breaking bad I discuss how nature makes ephedrine, which is really complex and ungainly and then suggest a quicker way. Here the cinnamic acid route is actually way quicker as a simple decarboxylation does the trick. S. cerevisiae to defend itself from cinnamic acid, it has an enzyme PAD1p that decarboxylates cinnamic acid. Thefore, all that is needed is PAL and PAD1.
butanediene
Previously I listed the possible routes to an terminal alkene, which were:
- Tautomerise a 2,3-ene
- Decarboxylate a 2,3-ene with terminal carboxyl
- Dehydrate reversibly
- Dehydrate irreversible via phopharylated intermediate
- Decarboxylate oxidatively
In the case of butanediene, it is a 4 carbon molecule already, which forces one's hand in route choice. Aminoadipate is used to make lysine when diaminopimelate and dihydropicolinate are not needed. That means that a similar trick to the styrene biosynthetic route could be taken, namely aminoadipate is eliminated of the amine by a PAL mutant, decarboxylated by a PAD1 mutant and then oxidatively decarboxylated by a mutant OleT. But that requires changing a lot the substrate for three steps and the cells went to a lot of effort to make aminoadipate, so it is rather wasteful route.
Another way is to co-opt the butanol biosynthetic pathway to make butenol and dehydrate that.
A better way is to twice dehydrate butanediol.
As mentioned for styrene, a reversible dehydration means that selection could be done backwards. However, pushing the reaction to that route would require product clearance, otherwise there will be as much alcohol as the alkene. With butanediol and butanol there is a production and a degradation pathway, which would mean that selection could be done with the degradation route, while the actual production with the production route.
acrylonitrile
That is a curious molecule to biosynthesise. There are nitrile degrading bacteria and some pathways make it, so it is not wholly alien. preQ0 in queuosine is the first I encountered. QueC performs a ATP powered reaction where a carboxyl is converted to a nitrile. I am not sure why, but a cyano group seems (=Google) less susceptible to hydrolysis than a ketimine for some reason —methylcyanoacrylate (superglue) follows a different reaction. Beta-alanine could be the starting compound, but it would require so many steps that it is a bad idea.
Substituting carboxyl for nitrile (nitrilating?) on acrylic acid with a QueC like enzyme would be better. Acrylic acid is small so it can be made by dehydration of lactic acid, oxidative decarboxylation of succinate or decarboxylation of fumarate. The latter sounds like the easiest solution as there are many decarboxylases that use similar molecules, such as malate or tartrate decarboxylase.
Challenges
Basically, even if it seems like a crazy idea at first, the processes are rather straightforward —one or two engineered enzyme for each pathway—, but the chemistry is pretty hardcore, so the few engineered enzymes will have to be substantially altered. Given that the compounds are small, quantifying yields will be their main challenge. How one goes about designing a selection systems for these is an even bigger challenge as evolving repressors to respond to small and solely hydrophobic compounds would be nearly impossible... So they will have to do this most likely by rational design alone, which makes it seem like a crazy idea after all.
Subscribe to:
Posts (Atom)