The Ecoresponsive Genome of Daphnia pulex

Colbourne et al. Science 331:555, 2011

Journal club by Antonio Marco

The paper in a sentence

The crustacean Daphnia pulex has more than 30,000 genes, an expanded set mostly originated by multiple tandem duplications, although the maintenance of these duplicates is associated with functional diversification of paralogs and the co-expansion of genes from the same metabolic pathway.

Background:

Our knowledge on genes and genomes mostly comes from species of little relevance in real ecosystems. Colbourne et al. provide the genome sequence of Daphnia pulex, a keystone species in freshwater ecosystems. Moreover, this is the first crustacean genome to be fully sequenced, so it is paramount to understand the origin and evolution of arthropod genes.

What the paper says:

The authors present the genome sequence of a Daphnia pulex strain, called TCO (“the chosen one”) with very low variability (nucleotide heterozygosity of approx. 0.14%). After sequencing and assembling they covered the 80% of the nuclear genome with 8.7-fold coverage. They predicted 30,907 genes, which is a large number for an arthropod. They had additional evidence for 26,867 of these genes (EST, proteomic analyses, conservation…), meaning that the initial set was reliable. Although Daphnia have a number of introns comparable to other invertebrates (excluding Drosophila), it suffered a reduction in the size of introns, as well as in the size of the intergenic regions. This streamlining of the genome is in sharp contrast with the high increment in the number of protein-coding genes.

The evolutionary analyses of the protein coding genes reveal that more than one third of them have no homologous counterparts in any other sequenced genome (Fig. 1A in the paper). This Daphnia-specific genes are often member of large multigene families (Fig. 1B), suggesting a Daphnia-specific gene expansion produced by high duplication rates. Figure 1C confirms this hypothesis. The number of sysnonymous substitutions among all pairs of duplicated genes in Daphnia reveals that no whole genome duplication may have been involved in the process, and most likely the expansion is produced by constant rates of gene birth-and death (Fig. 1D). Since they observed Gene Conversion events in the hemoglobin genes (Fig. 2 in the paper) they corrected for that in their analyses, and the results remained unaltered. Gene Conversion (GC) is a non-reciprocal DNA recombination. When GC occurs extensively among multiple genes of the same family (non-allelic GC), there is an homogenization of their sequences (that is, all they get very similar). Thus, GC is an important source of error in time divergence estimates.

Expression analyses reveal that paralogs diversify their expression pattern soon after they emerged (Fig. 3A). Actually, newly duplicated genes already have (on average) an expression level twice as big as their paralog (Fig. 3B). The expansion of gene families is also associated to metabolic processes. In fact, they found 19 pancrustacean (insects and crustaceans) gene families whose expansion is overrepresented in certain metabolic subnetworks (Fig. 4).

What about the Daphnia-specific genes? It looks like these genes are expressed more under ecological conditions (that is, exposed to biotic and abiotic  stressors such as Kairomone or Cadmium), whereas conserved genes are more expressed in lab/standard conditions (Fig. 5A). However, a closer inspection to those genomic regions expressed under ecological conditions (using tilling arrays) showed that most transcripts come from intergenic regions (Fig. 5B). That means that more genes may remain to be discovered, and the authors proposed more analyses using ecological conditions to explore the functional part of the genome.

Putting all this together, the authors proposed a model for gene evolution, that they call PBE (Preservation by Entrainment), described in the paper in Figure 6. Under this model, two paralogs can have incompatible expression patterns after duplication so one of them gets lost. On the other hand, two duplicates can maintain the same expression pattern and, if the increase in dose is beneficial for the species, they both are retained. A third possibility is that one of the paralogs changes their expression pattern, but this one interacts with a new partner and this new association is beneficial for the host. Both genes, again, are retained.

What we said about the paper:

This work involves several groups that systematically explored the many aspects of Daphnia genome. It was clearly a tour de force not limited to obtaining the genomic sequence but that additionally explored the transcriptomic changes under different conditions and/or treatments. Thus, we would expect a paper of this kind to be published in a high profile journal such as Science. However, most of our comments focused on the interpretation of the results more than in the validity of them.

The first observation that there are a lot of Daphnia-specific genes is biased by the fact that this is the first crustacean genome to be sequenced. The sequencing of other crustaceans (outside the Daphnia genre) will undoubtedly decrease this proportion. It is true, however, that many of these genes have no known homolog in insects, and that may still indicates an expansion within the crustacean lineage.

One of the most controversial topics is the effect of Gene Conversion in the analyses. Lynch and Conery (2000) showed that the gene content of genomes is a consequence of non-adaptive high rate of gene turnover. However, Teshima and Innan (2004) explored the effects of gene conversion in those studies estimating the age of gene duplications. Consequently, the authors of the present paper corrected for Gene Conversion. However, Gene Conversion is often detected using conservative approaches to prevent false positives, and that may produce that many true positives were not filtered out of the final datasets. We did not get deeper into the discussion because we did not know exactly how the Gene Conversion was calculated.

There was some discussion about the fact that all-against-all pairwise comparisons are used to calculate the age distribution of genes (Fig. 1D). Large recently expanded gene families may significantly contribute to reduce the average gene divergence, biasing the results.

A last comment involves the interpretation that retained genes are enriched for certain metabolic pathways, meaning that genes co-duplicate in the same metabolic sub-network. However, this pattern is observed for less than 20 gene families, involved in 7 sub-networks. Whether these results can be generalized needs to be further explored.

 

Posted in Journal club - phylogenetic methods | Leave a comment

Homoeolog-specific retention and use in allotetraploid Arabidopsis suecica depends on parent of origin and network partners


Chang PL, Dilkes BP, McMahon M, Comai L, Nuzhdin SV. Genome Biology 2010, 11:R125

Journal club presented by Simon Williams 29/03/11

Summary

The aim of the study was to investigate retention patterns of homoeologous genes (homologous genes from two parents joined in a hybrid genome) in Arabidopsis suecica (As), a natural hybrid of Arabidopsis thaliana (At) and Arabidopsis arenosa (Aa).  When the genomes of these two species combined, it resulted in a certain level of redundancy between genes that would perform the same functions.  Over time some copies of these genes may have accumulated mutations, may be expressed differently or may have been lost altogether.  This retention and expression is investigated and linked to species-specific networks of interacting proteins.

Why is this interesting?

What makes a species is, of course, a question that goes back to Darwin.  Recently-developed molecular techniques make it possible to be investigate these questions on a molecular level.

Analysis

Arabidopsis tiling arrays are used to determine the presence of a gene in the hybrid species.  A lab-generated F1 hybrid strain represents the ancestral strain as it was immediately after hybridization and was also analyzed.  Comparison of the array data as well as transcriptome data across species reveals that the hybrid has accumulated more deletions of At-originated homoeologs and uses the remaining At homoeologs less.  This indicates a preference for the retention and expression of Aa-gene copies in the hybrid.

When sorting these retained genes into interaction networks, they show that proteins that interact are more likely to retain/express copies from the same parent and that there is a lower occurrence of mixed-parentage in interactors than random.  This implies that the same parental copies are retained in pairs of proteins that interact, indicating a preference for binding the original partner.

Our discussion

The analysis of retention and expression is thorough and would seem to prove that there is an evolutionary advantage in favouring the retention of one parental species copy over the homoeolog.  The authors put forward a number of hypotheses as to why this occurs including reduced recombination in the At-derived genes, adaptation of Aa genes to the environment and the fact that Aa transcriptional machinery is preferentially expressed in the ancestral hybrid.  This latter point would suggest that Aa genes might have a ‘head start’ in expression and may go on to influence the patterns across the genome.  The viability of hybrid species will often depend upon the fortuitous random shuffling of the parental genes.  In this case it has favoured Aa­ retention.  It would be interesting to see if the preference for Aa genes is just a random event whereby one (of either) parental version is favoured; although this may not be possible without repeating the analysis over a number of independently generated hybrid strains.

We also feel that it would have been interesting if more analysis had been done on the potential for Dobzhansky-Muller incompatibilities between diverged homoeologs.  This mechanism, although suggested by the authors, could be analyzed further by comparing divergent proteins for substitutions in their interfaces.  This would of course be dependent on the availability of sequence and structural data for the parental species.

Conclusion

We feel that the subject of genome evolution in hybrid species is interesting and the authors have clearly done a great deal of work in this paper.  They have demonstrated a preference for retention of homoeologs and pointed to networks as a determining factor.  We feel that additional work could be done to pinpoint the molecular mechanisms involved in what is obviously and complex process with many selective constraints.

Posted in Journal club - phylogenetic methods | Leave a comment

Genetic history of an archaic hominin group from Denisova Cave in Siberia

Journal Club, 1st March 2010.  Presentation and blog entry by Laura Emery

Reich et al, Nature 2010 468 pp1053-1060.  doi:10.1038/nature09710

The paper in a sentence: In an impressive technological feat, the study has sequenced the complete genome sequence of a recently discovered an archaic human specimen, and remarkably, analysis suggests that the hominin named as ‘Denisovan’, is a sister taxa to Neanderthals, and may have contributed genetic material to the present-day human Melanesian population.

Background information: The path of human evolution from our chimp-like ancestor has been studied archeologically for many years. Now, in the advent of next generation sequencing technologies, and superior methods for the extraction and preservation of ancient DNA, we finally have the opportunity to study human evolution at the molecular level.

Prior to this report, the same group led by Svante Pääbo, sequenced the complete Neanderthal genome. In these analyses, they compared human, Neanderthal and chimp sequences to estimate that the most recent common ancestor between humans and Neanderthals lived ~800,000 years ago. Humans and chimps are believed to share a common ancestor around 6.5 million years ago, and so Neanderthals are far more closely related to humans, than they are to chimps. By examining allele patterns among human populations, they found that Neanderthals shared more derived alleles with Eurasian populations than African populations; a pattern expected if Neanderthals and Eurasians had interbreed some time after human populations emerged from African (some 100,000 years ago).

The human-like Denisovan specimen (sequenced in this paper) was first discovered in a cave in Siberia in the Altai Mountains. The cave was excavated by Russian scientists in 2008, and early last year, analyses of the Denisovan’s mitochondrial DNA (mtDNA) were published. It was not clear whether the Denisovan’s mtDNA sequence would be more closely related to human or Neanderthal mtDNA, or if it was equally related to both species. So, a phylogenetic tree was constructed from the mtDNA sequences of the Denisovan, human, Neanderthal and chimp (using a Bayesian approach with GTR + gamma + I) to try to uncover this evolutionary history. Based only upon these mtDNA sequences, the tree showed that Humans are most closely related to Neanderthals, and that the Denisovan is equally distantly related to both of these species, forming an outgroup. However, mitochondrial DNA sequences are notoriously unreliable for examining patterns of relatedness. Their lack of recombination means that each nucleotide position cannot be considered to evolve independently, thus violating the ‘independent sites’ assumption implicit in tree-building. Therefore the Denisovan nuclear genome was sequenced to provide a more reliable dataset from which to infer the evolutionary history, and it is this sequencing project and analyses that form the basis of this journal club paper.

Results and Our Discussion:

The sequencing of the Denisovan genome has been an impressive technological success. By comparing human, chimp and Denisovan sequences, it is shown that the Denisovan shares more sequences in common with the human than with the chimp. Then given this tree structure, it was possible to infer the ancestral sequences of the common ancestors, and identify the substitutions that appear to have occurred along each evolutionary lineage. Remarkably, along the Denisovan lineage, the number of substitutions that seems to have occurred is only 1.7x the number of that on the lineage leading to present day humans. This is impressive because studies of ancient DNA often suffer from very poor sequence quality due to degradation products that accumulate after the specimen dies. If these DNA lesions are not repaired prior to sequencing, then the sequenced product can have many apparent ‘substitutions’ which are in fact recent mutations. This issue was problematic for the Neanderthal genome, where the inferred number of substitutions on its linage was more than 10x the number on the human lineage. Mutations resulting from DNA degradation largely consist of GàA and CàT transitions, and so biased mutational spectra can be indicative of a degradation problem. Additional verification of the Denisovan sequence quality is indicated by the resemblance of its frequencies of each class of mutation to those inferred to have occurred along the human lineage.

Contamination?

Two independent sequencing libraries were constructed and levels of human contamination were estimated in a number of ways. First, by identifying human-specific mitochondrial DNA sequences among the Denisovan sequence reads, it was possible to identify 12/7,433 and 6/5,042 contaminant sequences in each of the independent samples respectively. This implies low levels of contamination of the order of 0.2%, and 0.1%. Second, the contribution of male contamination was estimated by identifying the proportion of Y chromosome specific sequences observed among sequence reads, by comparison with that expected if the individual was indeed male. This produced a similarly low contamination estimate of ~0 and ~0.43% but did not account for any potential contribution from female excavators. Finally, nuclear contamination was estimated by the identification of human-specific derived alleles based upon comparisons of sequences from five humans, one chimp, and the first Denisovan sequencing library. Occurrences of these derived alleles in the second Denisovan sample are indicative of human contamination, sequencing error, or that the Denisovan individual was heterozygous at the locus. A maximum likelihood framework was implemented to estimate the relative contributions of each of these processes, and provided contamination estimates of ~1% and 0.3% respectively. These estimates of human contamination are impressively low, although they rely upon enough of the correct Denisovan sequence to have been sequenced, to identify differences with, and thus contamination from, the human sequence. Strangely, the authors tell us that the excavation had uncovered several animal bones in the cave and yet the possibility of animal contamination was not investigated.

Archaic Hominin Evolutionary Relationships

What is the evolutionary history of the Denisovans? To address this question, orthologous sequences from seven different hominins: French, Han, Papuan, San, Yoruba, Neanderthal, Denisova were aligned with chimpanzee sequences. Various data filtering procedures were implemented to ensure that only the most reliable sites were included. By sampling the occurrence of transversion differences between pairs of sequences, it was possible to provide a score for the genetic distance between each pair of species and populations. One question of interest was how the Denisovans are related to Neanderthals and present day humans. In contrast to analysis of mitochondrial DNA, both the Denisovan and Neanderthal were found to be approximately equally distantly related to the human sequences, suggesting that they were sister taxa. Consistent with this, the genetic distance between the Denisovan and Neanderthal sequences was shorter than any of the distances between the archaic species and any present-day human populations. Errors due to DNA degradation were not explicitly dealt with in these analyses, and whilst transisitons (the most common class of error) were excluded, we wonder if elements of the apparent relatedness between Denisovans and Neanderthals are in part due to shared sequencing errors across hypermutable sites.

To estimate the time to the most recent common ancestor shared by humans, Denisovans and Neanderthals, a molecular clock was implemented under the assumption that Denisovan, Neanderthal and human sequences all evolve at the same rate. The high quality human and chimp sequences were assumed to have no errors, enabling an error-scaling factor to be estimated for each of the Denisovan and Neanderthal sequences, bringing their branch lengths in line with those of the human sequences. By calibrating the molecular clock on the divergence of human and chimp at 6.5 million years ago, the common ancestor of Denisovan, Neanderthal and human is estimated to have lived around 800,000 years ago. Similarly, the ancestor of the Denisovian and Neanderthal is estimated to have lived around 640,000 years ago. These analyses did not account for the reduced divergence times of the archaic species given the age of their fossils, however these ages are relatively small fractions of the estimated divergence times and so unlikely substantially affect these estimations. More rigorous methods are available for the joint estimation of sequence error, phylogeny and divergence times but nevertheless, these analyses provides us with a first glimpse and ballpark figures for the timing and pattern of archaic human evolution.

A similar analysis (to that described above) was performed upon multiple Neanderthal sequences in addition to human, Denisovan and chimp sequences. A phylogenetic tree was estimated from the resulting genetic distances (Figure 1). From this it seems that the Neanderthal sequences are surprisingly similar to one another. By comparison, the human branches (from San to Yoruba) are much deeper, indicating a more distant common ancestor. This observation was not anticipated, and so was followed up with further investigation of additional Neanderthal sequences. From these investigations it is apparent that there is low diversity across the Neanderthal sequences currently available. This has been interpreted as evidence for a population bottleneck in the Neanderthal lineage; however this claim might be premature given the limited sampling and potential ascertainment bias of current Neanderthal sequences.

Human – hominin admixture?

In a previous paper, the group provided evidence for Neanderthal admixture with Eurasian human populations. Here, they investigate potential admixture with Denisovans. To do so they produce various combinations of alignments containing four taxa including chimp, Denisovan or Neanderthal, with various pairs of human populations. In each case, the chimp is the outgroup, and analyses are restricted to the sites where there are two different nucleotides, and where the chimp shares a nucleotide with one of the humans populations. The derived allele belongs to the archaic human (Denisovan or Neanderthal), and if shared more frequently with one human population than another, may provide evidence for admixture between the populations. The results from these analyses are presented in Table1, and consistent with previous findings, show that Neanderthals share more alleles with the French than they do with the Yoruban (African) population. Denisovans also share more alleles with the French than Yorubans, but to a (significantly) lesser extent. This is interpreted as reflecting the shared ancestry of Denisovans and Neanderthals, rather than any Eurasian – Denisovan admixture.

Finally, the authors investigate the possibility that the Denisovans interbred with human lineages after their emergence from Africa. A principal component analysis was performed upon SNPs from the Denisovan, Neanderthals, 53 present day human populations and chimp. The positions of the human populations upon first two axes were examined (Figure 2), and revealed three distinct clusters; the Africans, the non-Africans and the Melanesians. Curiously, the non-African cluster was positioned towards the direction of the Neanderthals, consistent with the theory of Eurasian admixture with Neanderthals. More surprisingly, the Melanesian population was clustered in a position in the direction towards the Denisovan, giving rise to the hypothesis of Denisovan – Melanesian admixture. This hypothesis was supported by analyses of allele patterns across combinations of trees containing four taxa (shown in Table 1; see above for description of method). For instance, the Denisovan was found to share the derived allele with the current Papuan Melanesian human population 1.09 times more frequency than with the Han Chinese population; as expected if there had been Denisovan – Melanesian interbreeding. This finding remains consistent across analyses of different chromosomes, and using sequence reads of varying coverage. The authors note that whilst there are other plausible demographic models which could explain these allele frequency patterns, the admixture-model proposed is the simplest and thus most parsimonious.

This paper is technically impressive, biologically surprising, and with 28 coauthors and 90 pages of supplementary martial, forms an extensive body of work. It is very rare that a new archaic hominin is discovered and it is remarkable that its genome has been sequenced so successfully. These analyses have provided us with a unique glimpse into the Denisovan’s and indeed our own evolutionary histories. Admixture between the human Melanesian population and an archaic hominin is a surprising and unanticipated finding, and it will be interesting to see if data generated in future studies corroborate this. The study exemplifies the benefits of examining ancient DNA as a tool to uncover our own recent past, and surely paves way for future investigation.

Posted in Journal club - phylogenetic methods | Leave a comment

Biased gene transfer mimics patterns created through shared ancestry

Cheryl P. Andam, David Williams, and J. Peter Gogarten (2010) PNAS 107: 23, 10679-10684.

Journal Club Presented by James Allen  28th July 2010

The paper in a sentence: The authors describe a specific case of a gene that makes a bacterial enzyme, which has been horizontally transferred between species in a biased manner, such that the molecular evidence resembles that of a gene transferred by descent from parent to offspring.

Background: Until relatively recently, genetic information was thought largely to have been transferred from parent to offspring, analogous to a branching tree structure. The applicability of this analogy for all forms of life is under debate, however, given the discovery of the extent of other mechanisms for gene transfer in bacteria and other single-celled organisms. Horizontal gene transfer (HGT) refers to the process where genetic data from one organism is transferred to another which is not necessarily related, nor even necessarily the same species; the prevalence of HGT calls into question not only the ‘tree of life’ metaphor (suggesting, perhaps, that a network analogy is more appropriate), but also the (already rather labile) concept of species.

The paper in detail: The authors present one key result, which is supplemented by evidence from three other sources which would not be convincing in isolation, but here provide valuable circumstantial support. The results are based on a particular enzyme, which has the important property (for this analysis) that it has two distinct types. The main result is that the tree in figure 1 in the paper, generated by looking solely at this enzyme, has two distinct sub-trees, representing each of the the two types. Each one of these sub-trees closely resembles the tree that most likely characterizes the vertical inheritance of genetic data, i.e. the ‘species tree’ in figure 2. It is not easy to quantify whether one tree structure resembles another, particularly with the number of species used here; the authors look at the distances along the tree branches that separate all pairs of species, which discards information about some of the tree structure, but does not prevent them from convincingly demonstrating that the sub-trees for each type resemble the species tree. Moreover, in the species tree, the species with the same type of enzyme are grouped together within broader groupings at the phylum or class level; i.e. there are patches of red and green branches (representing the two types) in figure 2. This is evidence for biased HGT because it shows that HGT occurs not in a random fashion, but more often between more closely related species.

Another line of evidence presented is that a scenario of gene gain and loss that would explain the trees is far less likely than one where some degree of HGT occurs; the authors gloss over the fact that this demonstrates that HGT, rather than biased HGT, has most likely occurred. Additionally, the genes that surround the enzyme’s gene are found to be similar for both types, which would not be the case if the genes were being repeatedly gained and lost; again, this is evidence for HGT, not necessarily biased HGT.

The final piece of supporting evidence comes via simulations of biased and unbiased HGT, which result in data that resembles the real data. An extreme bias is modelled, using an exponential function, so that transfers are likely to occur between only the most closely related species – this is probably realistic, although the use of this particular model is not justified by the authors. Finally, the unbiased and biased transfers are simulated sequentially, which was perhaps done as it is often easier to show that something is changing, rather than staying the same, but is an uncommon approach that makes it difficult to interpret the results.

Journal club conclusion: In the case of this particular enzyme, the horizontal gene transfer is biased, such that transfer is more likely between more similar species, and thus the molecular data provides the same signal as transmission through vertical inheritance. It remains to be shown how widespread this phenomenon is; if HGT generally reinforces, rather than contradicts, vertical inheritance of genetic material, then the tree of life analogy may well be useful for practical purposes, even if does not reflect the true evolutionary history.

Posted in Journal club - microbial evolution | Leave a comment

Why genes evolve faster on secondary chromosomes in bacteria

Cooper VS, Vohr SH, Wrocklage SC, Hatcher PJ

2010 Why Genes Evolve Faster on Secondary Chromosomes in Bacteria. PLoS Comput Biol 6(4): e1000732. doi:10.1371/journal.pcbi.1000732

Journal club 30/06/10, Presented by Simon Williams

Background:

The authors tackle the subject of secondary chromosomes in bacterial genomes. Why and how these multi-chromosome genomes have arisen is largely unknown but one hypothesis states that they have evolved from plasmids and are used as an ‘evolutionary test bed’. A previous study has found that divided bacterial genomes have altered levels of gene expression.  Genes on chromosome 1 tend to have the higher expression levels compared to genes on chromosome 2 and this is reflected in the location of conserved housekeeping genes on the major chromosome.  The paper aims to demonstrate that the genes on chromosome 2 are under less selective constraint than those on chromosome 1 and as such are less conserved with faster rates of evolution.

Results:

  • The authors identify ‘panorthologs’ – orthologous genes found within each genome of their test sets. They show that these are more numerous on the primary chromosome demonstrating that the most conserved genes across the different species are located here in preference to the additional chromosome(s).
  • They then measure the rates of synonymous (dS) and non-synonymous (dN) mutations in each chromosome showing that the primary chromosome has the lowest rate of evolution.
  • Codon usage bias – the preferential use of certain codons that may affect translational efficiency etc – is also addressed.  They find reduced codon usage bias on secondary chromosomes.
  • They then attempt to tease apart whether these differences are due to chromosomal location or an inherent characteristic of the genes themselves. They compare the dN rates of genes found on the secondary chromosome in one species with orthologous genes located on chromosome 1 in a species with a single chromosome genome.  The dN rates are greater for these pacticular genes than those which are always found on chromosome 1 in both multi- and single-chromosome species, demonstrating that these genes evolve faster regardless of their chromosomal location.

Our discussion:

We thought that the authors set about to answer an interesting question and did this as thoroughly as they could given the data available to them.  The idea that different selection pressures on different chromosomes should influence expression and therefore evolutionary rates is a simple one but throws up some interesting points regarding secondary chromosomal ‘evolutionary test beds’.

The authors find that the genes on chromosome 2 evolve faster regardless of their chromosomal location.  This seems to be an important result and is taken into account but rather overlooked as the authors push their argument that chromosome position is the stronger of the driving forces behind the accelerated evolution.

PLoS Computational Biology (2010) Apr 1; 6(4):e1000732

Posted in Journal club - microbial evolution | Leave a comment