Brainstorm: An alternative to the tree of life

One of the greatest insights of modern biology is the Tree of Life metaphor– that all organisms share common ancestors if we go back far enough, and that we can understand a great deal about an organism based on which evolutionary forks it and its ancestors have taken.

This has been and continues to be a profoundly useful tool in nearly all subfields of biology. But it was created before we knew anything about genetics, and it’s starting to show its age– especially in the context of single-cell organisms, whose cellular machinery and evolutionary history allow organisms very far apart in the ‘tree’ to readily swap significant amounts of genetic material.[1] This sort of gene swapping, or Horizontal Gene Transfer, as it’s called, happens in plants and animals as well– think of mitochondria and chloroplasts, once organisms in their own right, now mere cellular power plants with much of their original genetic code shuffled into their hosts’ genomes.[2] But as a rule, most HGT happens in the contexts of bacteria and viruses. And HGT is extremely common there.

So we have these concepts of distinct species and this branching tree of life, and they’re incredibly useful when talking about plants and animals, but in the contexts of bacteria and viruses they become rather strained when organisms from very distant branches constantly share lots of genetic code. The core organizing assumption which gives the tree metaphor and our current phylogenic system meaning, that once organisms branch off sufficiently far from each other they can no longer share genetic code, is often false in these contexts.[3] And under many metrics, most of life’s genetic diversity is contained in the bacterial and viral domains, so this is not a trivial problem.

So we can keep trying to extent the current tree metaphor, or we can start looking around for a new model.[4] I think both are worth doing.

So what would an alternative to the Tree of Life look like?

I don’t have an answer to this per se, but it seems to me the way forward is to recognize the core insight of the tree metaphor- to group things that have more shared evolutionary history closer together- but to apply this insight at the level of the gene rather than the organism. Essentially, I think a new system could be built by sequencing everything and having computers crunch the numbers, identify co-evolved gene clouds, highlight the genetic links between organisms, and sort organisms based on these links.

This approach could simplify down into or replicate most of our current phylogeny in organisms with low HGT (eucaryotic organisms are mostly isolated co-evolved gene clouds which should be grouped together, and grouped near the other eucaryotic organisms they share recent history with) while leaving the door open to a more elegant treatment of the edge cases of e.g., bacteria and viruses, which may be amalgamations of distinct co-evolved gene clouds with separate evolutionary histories.

But the devil will be in the details, and creating a new phylogeny is particularly tricky in that any sorting algorithm includes contingent assumptions about what sort of answer we want when asking, what is the nature of the relation between organism X and organism Y?

It’s interesting to think about. Realistically speaking, our current phylogeny is much too fundamental to most of modern biology to be replaced anytime soon. But it’ll be interesting to see if and how people attempt to apply the gene-level shared history idea to patch up our current organism-level shared history phylogeny.

Footnotes:

[1] This rampant HGT happens in multiple ways: bacteria can share plasmids, which are sort of modular pieces of genetic function able to be easily swapped in and out. If one strain of bacteria develops resistance to a drug, it may share that resistance to other strains through a plasmid. Bacterial DNA is also less isolated and protected than eucaryotic DNA, so ‘free floating’ DNA is much more likely to be integrated into the cell.

Viruses, on the other hand, exist by hijacking existing cellular machinery to splice themselves into genomes then copy themselves, and evolve resistance by being extremely sloppy in their duplication methods, both of which can lead to significant HGT. As well, viruses are hardly limited to infecting plants and animals; those which infect bacteria and other viruses (bacteriophages and virophages, respectively) can also be vehicles for HGT.

[2] Our genomes are filled with ancient, defunct viruses who spliced themselves into our genes but then couldn’t get out. Recent surveys of the human genome indicate that these defunct viruses take up more space in our genome (2%) than do actual protein-coding genes (1.4%).

Recent research indicates that this has been a useful source of genetic diversity: the mammalian placenta, for instance, repurposes genes originally from an ancient retrovirus to protect itself from being attacked by the mother’s immune system.

[3] That’s the conceptual argument for a new type of phylogeny. The pragmatic argument is that an infectious bacteria or virus’s position on the tree of life does not tell us much about how it spreads, where in the body it can thrive, or how to treat it. It would be nice to have a phylogeny that would naturally indicate such things.

[4] A possible extension of the tree metaphor is put forth by Frederik Cohan of Wesleyan University, who suggests adding an ‘ecovar’ notation (short for “ecological variant”) to bacteria and viruses. As Carl Zimmer so succinctly puts it, “The bacterial strain that caused the first recorded outbreak of Legionnaires’ disease in Philadelphia, for example, should be called Legionella pneumophila ecovar Philadelphia.”

Notes:

It may be neither here nor there, but in writing out a wishlist of the perfect phylogenic system, I came up with that it should deal with the following:
– Common descent, evolution of major function, and speciation (as the tree metaphor currently does);
– HGT (specific gene chunks that were transfered, and past lineages & other signifying metadata of those genes);
– Phenotype & function: cellular mechanics/architecture and proteomic profile (trying to classify organisms in terms of what goes on ‘under the hood’);
– Current ecological niche (e.g., Cohan’s ‘ecovar’ notation).

Others’ lists may differ.