Friday, 13 May 2016

Gene symbol poetry

A rather angry pathway.
"Regulatory protein for 2-phenylethylamine catabolism · two component sensor histidine kinase,essential for acid-tolerance · nikkomycin biosynthesis protein · galacturonide ABC transporter ATPase".
That is my first (and last) attempt at genetic constrained writing: the genes encoding those protein are feaR actS sanS togA, which is a rubbish sentence that makes somewhat sense.
Constrained writing () is an artistic challenge where one writes with a restricted dictionary, for example, there are no word with the letter e in the book Gadsby. Here I restricted my dictionary to words that are also gene symbols.
Occasionally gene symbols are spelt the same like English words —like the one for fucolokinase. It is very amusing and therefore I thought I would try and seem how many there are and what sentences could be made. It turns out that histidine kinases sensors, which are often the gene S in a pathway, are mighty useful, but the game is still overly challenging.
Gene symbols are written in italics, lowercase with the fourth (if present letter uppercase), which is not quite the same as sentence case, and that gene symbols are generally pronounced with the first three letters as a word and the last as a letter (e.g. dadA is read dad-aye and not dada), so the game is not technically accurate. So the gene symbol for fucolokinase does not sound like a naughty word, but fuculose operon repressor (American accent) or fuculose-P aldolase (British accent) does. But figuring out the cases that sound like words is too hard. So I am sticking with the written option. The acceptable words/symbols is the set intersection of all the gene symbols in the prokaryotic ptt files and a word list I found on GitHub ("en.txt"). A lot of the three–four letter words sound like words that get people infuriated in a Scrabble game (e.g. I love pizza, but I have never called it "za"), but that adds to the charm.

Word/symbol List:
Appendix: code

To find what the gene symbol meant I used the same all.ptt folder and grep -r "xxxX" . in the terminal.

  1. "Cell division topological specificity factor, AI2 ABC transporter ATP-binding protein, D-lactate dehydrogenase, ATP hydrolyzing helicase, tungsten-containing formaldehyde ferredoxin oxidoreductase, ABC-type arginine transporter periplasmic binding protein, arsenate reductase, beta-lactamase B."
    Encoded by:
    minE ego did dinG for artI arsE blaB