Tuesday, 4 September 2012

Homologues and synonyms

All creatures great and small evolve, but evolution is not limited to the biological realm: languages evolve too and in a similar way.
My interest in languages stems from their ability to evolve with many similarities to enzyme evolution (and in light of not being able to spill the beans on my own research, I seem to post here a lot about words —And this is another!)
The technical words in linguistics differ from those in biology and I have yet to find an enlightening paper on the coevolution of synonyms and the parallels between language and genomes.

In enzymology, protein function and structure are not not always linked. Proteins with different functions can have similar structures having evolved from the same ancestral gene (functionally-divergent homologues) and proteins with different structures can perform the same function (functionally-convergent analogues).
In a language, words can have similar meanings, yet have different origins (synonyms) and words from the same origin can have different meanings.

New genes enter an organism via gene duplication, horizontal gene transfer or gene birth. Similarly, new words enter a language by duplication, loanwords (words of foreign origin) or onomatopoeiae.

English is a really great language for two reasons: it is my mother-tongue and it is a hybrid.
Regarding the latter reason, over a third of the words in the English dictionary were acquired from French due to the Norman invasion of 1066. The grammar and basic vocabulary remained that of Old English, whereas the advanced vocabulary was that of Old French.
A parallel to this can be drawn with the weird and wonderful bacterium Thermotoga maritima, 24% of whose genome is of archaeal origin: the replication and expression machinery is bacterial, but several metabolic routes are archaeal.

A loanword is a foreign word that is adopted in a language thanks to its real or percieved utility, whereas a horizontal gene transfer is an aquisition of a foreign gene in a genome thanks to its fitness benefits. The original spelling of loanwords are gradually lost in favour of a more compatible spelling and similarly the codon and amino acid compositions of trasferred genes also gradually change to match that of the new genome.

Often synomyms of different origins coexist, in English for a while the Old-Norse–originating window (vinauga, wind+eye), the Old-English–originating eagþyrl (eye+hole) and the Old-French–originating fenestre were coexisting, until one won. When French entered into the English language many synonyms died, but many survived in a figurative meaning, for example, enthrall litterally means to enslave, but nowadays it is used figurately, ie. capuring one's attention.
With genes the pressure to remove genes with identical function (be they paralogues, xenologues or analogues) depends on the selective pressure and in Eukaryotes they can coexist for some time.
Some words adopt alternative spellings, which can dissapear or gain different functions, such as the pairs yet and get, passed and past, and especial and special. Gene duplication does the same. In the IAD model of gene divergence, the new function comes before the amplification.

English, French and several other languages use the Latin alphabet to write words down. However, the phonetics of Latin lacks several sounds present in these and work-arounds are done. The letter h is used as a modified in English to make wh, th and ch sounds and vowel combinations allow extra vowel sounds. Old English used to be written with Anglo-saxon runes and then, with Christianity, switched over to the Latin one, bringing with it two heavily used runic letters absent in the Latin alphabet (þ and ƿ). However, as most typesetting letters were produced in France the letters died out. Despite the obvious simplicity they endow, new letters are very rarely added.
The work-around the lack of letters results in coding problems: the th group in Bath, Chatham Islands and Thailand differ, as the former is the fricative, the second is a word fusion (the H is not a modifier) and the latter is a plosive (a “violent” T). The only way to know is to know the exceptions or guess from the etymology, which can be misleading in thyme and Thames.
In protein several non-standard residues can be found. None of these use a recoded codon by itself, but require special sequences or post-transcriptional modifications, so the system differs slightly in that new letters (codons) cannot be added, but is similar in its workarounds.

Many differences exist, however.
Languages have grammar and generally authoritative bodies acting independantly of the evolution of the language (French academy).
Many suffices and prefices are highly constructive, a modularity that is hard to find in genes.
Enzymes have promiscuous function, minor non-physiological functions that can lead to new main activities: a property that is different in words. Words have multiple definitions and some are often confused. The word brothel used to mean prostitute, but it was confused with the word bordel and the meaning of the latter was bestowed on the former. Biannual traditionally means two-yearly, but its continual misuse as semestral means that the word has two discordant definitions, annulling its functionality. Genes have layers of regulation, whereas words don't. However, the metaphor could be stretch to equate grammar to regulation, but that may be wholly in the handwaving realm…