Tuesday 3 March 2015

Bacterial Dissociated Press

UPDATE: The code has been adapted to become am online bacterial name generator thanks to Brython.
There are several ways to make a random name generator and it depends mainly on the probability of the letters. A Markov chain is a model where the probability of finding a letter depends on the previous letter(s). In the script I wrote, the first and last letters are underscores thus preserving the probability of first and last letters, the downside is that there is no letter minimum or maximum:
These are two names I got out of 100 trials:

  • Cobaceobacumomyeriflvijerellaniocidenactilasttiacobacoballlitelanas priartrogntubophae
  • Ps ge

The dataset used to obtain the probabilities of letter combinations (training) influences the results obviously. Here I parsed LPSN for species names and treated the genus and the species epithet differently.
Amusingly, using only two letters combinations gives fairly Latin sounding names. Here are five non–cherry-picked examples:

  • Vucocovinenacrenistrngidobacirdocotromyconsipumostereus ateue
  • Peus s
  • A zonvi
  • Aceropareainobastheptha menzi
  • S lfa
  • Kllacererhelucoccimycelusm ngum
  • Lumibodom pensidiaqiscusicanise
  • M psplumolinisonolacia
  • Rerella or
  • Lochadibriciziberononacomyctritr selietimolerbumiseelaltaleni

They look like species names, but are not quite right. Some affices appear as 'Kllacererhelucoccimycelusm' testifies (cocci, myce), but also impossible consonant clusters for Latin (klla, which I'd try reading that as kł; ng at the start of a word is a normal in many languages, but Latin does not have a ŋ sound). Going through the list there are some cool ones that look diabolically hard to read (Seogactychoreriumacesheraes wautenditolans), have more Welshness (Llacosula thiabisini) or have recognisable words (Mum joliaiinecuvigoransis).
When the kmer is changed from 2 to 3, the names make a lot more sense. A lot more affices surface and less weird consonant clusters. Here are 100 non–cherry-picked entries.
I like that one is 'Clobacterium aliens'. There are some other real words and some odd ones, such as yogalitre —a novel bohemian unit of volume.
  • Enoalobariononasacter futerxiackitrum
  • Lapper ase
  • Paulfobiumickeyerium yogalitre
  • Strellum gitormeneaericus
  • Shellococolanoreptomycococchrio mens
  • Chthenia fentanonii
  • Sacilla abus
  • Strevosphinobiseudoanopla baalii
  • Microidus cociensis
  • Sporynes virotum
  • Microma aneaxenni
  • Bacteimetocobactomonomicinorynterogangioidethia turomedurpellis
  • Clospora aliensiseudorandrophamper
  • Des paris
  • Clostrenematisher phizoreens
  • Microbacter ansifixtrolytins
  • Acterium psyrarchens
  • Clostreptomuromanterobium gina
  • Bureptostrevoserobacter amardaenii
  • Phaeudobacillusobactomadalmotenbellus aturivalinastis
  • Nobactia senteriensis
  • Clobifidiumonaeriomacinocellus glus
  • Derium se
  • Psia se
  • Preptoraneobactetacter prosalisquaes
  • Actima vense
  • Gemacia crynifichilum
  • Angobacilla chaiwalficaphiphilutyris
  • Exus thens
  • Mer imarinus
  • Streptophizawanorevibactermonas mesbensis
  • Furax anderobicum
  • Flactermonas paramesolachilum
  • Sulus sisidigii
  • Pseudoalder cophiilatis
  • Haenimicrobacila vionii
  • Stahayloseucocorobacilelochrotrea piresele
  • Bacillus agnis
  • Statospormonovobachabacteobacilibrix koreschum
  • Burkhodosalicronoccus licuni
  • Des nonca
  • Ardium sponensis
  • Aces cysalkii
  • Coccus permervensicarinea
  • Streptophizobacterotobacillatomycetburadsketinossobactellum chrophitrariosteracecidiolipalense
  • Streptobacilebacillum sphonensii
  • Rum tus
  • Macesula rundebens
  • Sponaeringomonas sophagii
  • Lyanosphoebacterium matus
  • Amurandroidia clocynitucum
  • Chizoodomicillanelluseubroidobacococas hungbaiwadimuyanensis
  • Xybacterium alkapiensis
  • Amycetoba solitophae
  • Actomonaspobactomonia mer
  • Clobacterium aliens
  • Pla fereenzotidimareequatus
  • Er premanum
  • Xanomonerium licatus
  • Methylososphio inaeens
  • Pedium inans
  • Matoces marum
  • Paea mae
  • Nostreptomanobacter montemophiganginhila
  • Almonaenoclovardium nerrakus
  • Metherococycobium flugongtoriensisciforyntica
  • Seudocococchaenibbellus thilucentoluceus
  • Mydobacter pedia
  • Hangium facransis
  • Strenebsibacromycona lacamingensii
  • Agium stidalis
  • Glum stans
  • Des vibensenigens
  • Cobacteudonacteringium inis
  • Es amitra
  • Alium thili
  • Palgomaduradreptomona wolis
  • Brix vi
  • Phillas marboryzaeongwanenicus
  • Bium callus
  • Zhio lense
  • Thanas wayticolipleolensicalbies
  • Nospirsia ebadiaense
  • Des adaicowskiabilicus
  • Kocococolamsis ans
  • Amyces danivoranse
  • Bruiherillosporreptomyces thillupraecae
  • Satrellas geolisis
  • Des stum
  • Bacillum putiduciticus
  • Stertingobium stenti
  • Spirga odanae
  • Roger oleovoricula
  • Feria asterium
  • Pellacteriodoccus flatus
  • Sphylobacter yogensis
  • Pseudobacter sis
  • Sphomyxillimora latus
  • Strepla puyatlasis