Research vs. glitchy data munging

Thursday, 7 April 2016

Research vs. glitchy data munging

I apologise, but I could not resist this rant...
I am a big advocate for big data —also it's my job—, however one trend I find disturbing is the frequency of attempts to make automated pipelines to predict how one could make a given compound: in a large amount of cases, doing some reading up is way more efficient. Not to mention, without bugs.


I understand organic chemistry and I pride myself in understanding bacterial metabolic logic (trends and tricks employed by metabolism) and if I am shown a compound I will give a good guess on a cell would make it even if nobody as ever studied it. Given a compound I would read up as much as I could and generally there is a missing enzyme or there is someone wrong with the literature, so an observant eye is needed. It is a task that needs knowledge and intuition. It does not take long really and is mighty fun.
Therefore I am utterly lost when it comes to understanding the utility of tools that propose to munge data from a bunch of databases in an attempt to gain the same conclusion.
For sugars or alcohols with many links to central metabolism where there is a complex interplay of fluxes, I see how they could be useful. But they seem to be sold to tackle a molecule that finds itself at the end of a long pathway, which may have one or two variants along the way —which is probably beautifully described in MetaCyc. When used these tools (there are many) will either use a reaction backwards or fall pray to an annotation error —I tried one where I asked it how one could biosynthesise a cofactor and the quickest pathway was a reaction that as a product had this cofactor, but not as a substrate.
The non-model organism argument does not really hold up either. A well curated genome is essential and regardless of what the automated pipelines promise an expert hand is needed, so it would make no sense if the human expert forgot his/her biochemistry at milestone completion.
The major thing that baffles me is that of all tasks in planning the engineering of a pathway reading papers and doodling chemical structures is by far the most pleasant, so why would anyone want to farm it off to an incompetent computer is beyond me...

No comments:

Post a comment