What makes #TopAnat special relative to classical #GeneOntology enrichment?

In bioinformatics and genomics, we are all familiar with GO (Gene Ontology) enrichment test. You take a gene list, paste it into a tool such as Gorilla, PANTHER, or others, and obtain a list of terms which are enriched in your gene list.

How this works is that each gene in your gene list has GO terms associated to it, through experimental or computational evidence. For each term, we can thus count how many times it is associated to a gene in your list, and compare this to the count which is expected from a random gene list of the same size (same number of genes).

TopAnat does the same (see also here), but each gene has anatomical terms associated to it. And here is an important difference: all associations are experimental. TopAnat uses Bgee expression calls, which are from an integration in situ hybridizations, RNA-seq, microarrays, and ESTs. No gene is associated to “brain” because its ortholog is, or because it is paralogous to another gene expressed in the brain, or shares a domain which is frequently found in brain genes. A gene is only associated to the brain because we have experimental evidence that it is expressed in the brain or a sub-structure of the brain (e.g., all genes expressed in cerebellum are expressed in brain).

Because we have so much expression data (also see this talk), and it is increasing, we actually do have such annotations for most genes. Because RNA-seq is applicable to all species, we have such annotations for all species in Bgee (17 at present). Because in situ hybridizations are very precise, we have such annotations for many tissues and cell types in non model (or emerging model) organisms, from cow to platypus and anole lizard.

This is particularly interesting because expression patterns closely match the type of function covered by the Biological Process of the GO, which is the hardest to predict (e.g., CAFA). Here, we do not predict, we report.

Finally, because Bgee only includes manually curated data from healthy wild-type samples, the associations correspond only to the “normal” function of the genes. This is not to say that the implication of genes in diseases or genetic modifications are not interesting, but they should not be confused with the normal function.

When TopAnat provides you a list of anatomical terms, you can know that:

  • they are experimentally supported in the species which you are studying;
  • they integrate all available information, not only one datatype;
  • they only represent healthy wild-type gene functions.

Enjoy TopAnat!


About bgeedb

This is the blog of the database of gene expression evolution Bgee, at University of Lausanne and Swiss Institute of Bioinformatics. http://www.bgee.org/
This entry was posted in topanat. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s