Update: This was the first blogpost on TopAnat. There are now quite a few more, under the tag TopAnat.
We are glad to roll out the first public beta version of our new tool, TopAnat:
TopAnat uses the classical GO-enrichment approach (specifically, from TopGO), comparing terms associated to your gene list to those associated to a background list, and reports terms which are over-represented. The main differences with GO-enrichment are that
- the ontology used described anatomy rather than gene function;
- the association between genes and ontology terms is obtained from gene expression patterns rather than annotation.
Like for GO-enrichment, you can use a default background of all genes in your organism with expression data in Bgee, or you can upload your own background set. Because this is based on TopGO, you can use several algorithms to decorrelate the ontology, to avoid reporting terms which are annotated to the same genes because of part_of relations, such as “prefrontal cortex” and “frontal cortex”. The options are:
- No decorrelation
- Weigth (update: now by default)
For now, genes are only associated to anatomical structures through present / absent calls, i.e. you will obtain structures with more “present” calls than expected by chance. In the future, we will be adding enrichment based on expression being higher in some tissues than others (“over-expression”). The test as it is shows that present calls of expression already contain a lot of biologically relevant information. For example, here are the top anatomical structures for genes associated to the GO term “neurological system process” in mouse (see full results here; loading can be a bit slow):
Because developmental expression patterns can be very different from adult ones, we compute enrichment twice by default, for expression patterns in development (“embryo stage”) and in new born to adult (“post embryonic stage”). If you feel that more detailed breakdown is needed, please ask us, although note that in some cases we will probably lack statistical power.
Because Bgee annotates only healthy wild type expression data, you can rest assured that the results are not pulled by expression in tumors, KOs, or other diseases, but are representative of healthy biological processes.
And because Bgee integrates in situ hybridization with microarrays and RNA-seq, you can obtain very detailed anatomical information in mouse, fly, zebrafish or nematode, as shown in the mouse neurological process genes above.
Finally, because Bgee annotates expression data from 17 species, you can immediately perform tests in all these species. Although be warned that in species with less data tests will lack power. Example : tissue enrichment of genes on the X chromosomes of platypus (loading can be a bit slow). Note that while the FDR is not significant (lack of power), the top structures make sense for sex chromosomes.
We will be testing and improving TopAnat over the next weeks, and we already look forward to playing with it. We are confident that it will be useful, and hope that many of you will be able to use it to gain biological insight into those gene lists you get in this big data biology age.
TopAnat is based on an adaptation by Julien Roux of TopGO, by Adrian Alexa. It has been incorporated into Bgee by the Bgee team (names tagged Bgee in this list), and the graphical user interface has been developped by the SIB WebTeam. All data in Bgee are annotated to the Uberon ontology.