Unnatural amino acid biosynthesis

Saturday 12 September 2015

Unnatural amino acid biosynthesis

In the synthetic biology experiments with an expanded genetic code the biosynthesis of unnatural amino acid is not taken into consideration as the system is rather rickety and the amino acids unusual.

Similar amino acids

A lot of introduced amino acids are dramatically different from the standard set. However, there are several amino acids that never made it to the final version of the genetic code as they are subtly different from the canonical amino acids and must have been too hard for the high promiscuous primordial systems to differentiate. However, subtle differences would be useful for finetuning. Examples include aminobutyric acid (homoalanine), norvaline and norleucine, allo-threonine ("allonine"), ornithine or aminoadipate. If there was a way to introduce them and increase the fidelity it would be hugely beneficial. It would probably result in enzymes with higher fidelity and catalytic efficiency.
That Nature itself failed back then does not necessary mean that scientists would fail with modern metabolism. The main drawback is a selection system. Current approached to recoding rely on the new amino acid as fill in as opposed to something that makes the E. coli addicted to it. The latter would mean that the system could evolve to better handle the new amino acids. Phage with an unnatural amino acid have higher fitness (Hammerling et al., 2014). Unnatural RNA display (Josephson et al., 2005) could be used to generate a protein that is evolved to require the non-canonical amino acid as nearby residues are evolved to best suit that enzyme, if one really wanted to all that trouble. Alternatively and less reliably, GFP with different residues does behave differently and position 65 could handle stuff like homoalanine, but the properties between S65A or S65V are not too different. So if a good and simple selection method were present it could be doable.

Novel amino acid biosynthesis

This leaves with the biosynthesis of the novel amino acids, which is the main focus here. There are many possible amino acids to choose from, and a good source of information for that is the wikipedia article non-proteinogenenic amino acids  — I (reticently) wrote many years ago to sort out the mess that there was, but I subsequently left to the elements and it has become a bit cluttered like an unselected psuedogene. Why certain amino acids made it while other did not is discussed in a great paper by Weber and Miller in 1981.
Most of the amino acids that nature can make would just make structural variants. Furthermore, mechanistic diversity mostly comes from cofactors (metals, PLP, biotin, thiamine, MoCo, FeS clusters etc.), which is a more sensible solution given that there only one per certain type of enzyme. Some amino acids that are supplemented Nature cannot make with ease, in particular chlorination and fluorination reactions in Nature can be counted with one hand.

Homoserine, homocysteine, ornithine and aminoadipate.

E. coli already makes homoserine for methionine and threonine biosynthesis and homocysteine via homoserine for methionine. Ornithine is from arginine biosynthesis and was kicked out of the genetic code by it. Gram positive bacteria, which do not require diaminopimelate, make lysine via aminoadipate ("homoglutamate").


The simplest novel amino acid is homoalanine (aminobutyrate). 2-ketobutyrate is produced during isoleucine biosynthesis, which if it were transaminated it would produce homoalanine. Therefore it is likely that the branched chain transaminase probably must go to some effort to not produce homoalanine (forbidden reaction). This amino acid is found in meteorites and is really simple, but its similarity to alanine, hence why it must have lost out.

Norvaline, norleucine and homonorleucine.

In the Weber and Miller paper the presence of branched chain amino acids was mentioned as potentially a result of the frozen accident. From a biochemical point of view, the synthesis of valine and isoleucine from pyruvate + pyruvate and ketobutyrate + pyruvate follows a simple pattern (decarboxylative aldol condensation, reduction, dehydration, transamination). The leucine branch is slightly different and is actually a duplication of some of the TCA cycle enzymes (Jensen 1976), specifcially it condenses ketoisovalerate (valine sans amine) and acetyl-CoA, dehydrates, rehydrates, reduces, decarboxylates and transaminates. If the latter route is used with ketobutyrate and acetyl-CoA one would get norvaline, with ketopentanoate (norvaline sans amine) and acetyl-CoA one would get norleucine. These biosynthetic pathway have actually been studied in the 80s. The interesting thing is that it can be used further making homonorleucine. Straight chain amino acids make sturdier protein, so there is a definite benefit there. The reason why norleucine is not in the genetic code is that it was evicted by methionine, which finds an additional use in SAM cofactor (Ferla and Patrick, 2014). Parenthetically, as a result the AUA isoleucine codon is unusal —it is also a rare codon (0.4%) in E. coli— as to avoid methionine has an unusual tRNA (ileX), which would make an easy target for recoding.


Threonine synthase is the sole determinant of the chirality of threonine's second centre.


Chorismate is rearranged to prephenate and then oxidatively decarboxylated and transaminated to make tyrosine, while the hydroxyl group of chorisate is swapped for amine for folate biosynthesis. If the product of the latter followed the tyrosine pathway one would aminophenylalanine.

Rethinking phenylalanine.

While on the topic, the whole route for phenylalanine biosynthesis is odd. It feels like an evolutionary remnant. If one were to draw up phenylalanine biosynthesis without knowing about the shikimate/chorismate pathway the solution would be different. If I were to design the phenylalanine pathway I would start with a tetraketide (terminal acetyl-CoA derived; 2,4,6,8-tetraoxononanoyl-CoA), cyclise (Aldol addition of C9 in enol form to C4 ketone), two rounds of reduction (6-oxo and 8-oxo) and three dehydrations (4,6,8-hydroxyl), followed by a transamination (2-oxo): phenylalanine by polyketide synthesis!


Phenylalanine has a single aromatic ring, naphthylalanine has two. Using the logic for the rethought phenylalanine synthesis we get a synthesis by hexaketide (2,4,6,8,10,12-hexaoxotridecanoyl-CoA). Namely cyclise (Aldol addition of C11 in enol form to C6 ketone and C13 to C4), two rounds of reduction (8-oxo and 10 or 12-oxo) and five dehydrations (4,6,8,10,12-hydroxyl), followed by a transamination (2-oxo). Naphthylalanine has been added to the genetic code (Wang et al., 2002), but was added as a supplement. GFP doesn’t work with either form of naphthylalanine (Kajihara et al., 2005), therefore there isn’t a good selection marker where the amino acid itself is beneficial. So probably the worst sketched pathway to possibly make.

Other aromatic compounds.

Secondary metabolites are often made by polyketide or isoprenoid biosynthesis, which are rather flexible so some extra compounds could be drawn up.

LEGO style.

All amino acid backbones are make in different ways and only cysteine, selenocysteine, homocysteine and tryptophan operate by a join-side-chain-on-with-backbone approach. Tryptophan synthase does this trick by aromatic electophilic substitution, where the electrophile is phosphopyridoxyl-dehydroalanine (serine on PLP after hydroxyl has left), while the sulfur/seleno amino acids are by the similar Micheal addition. So this trick could be extended to other aromatic, carbanions and enols, if one really wanted to, but that if far from a nice one size fits all approach.

1 comment: