We are proud to be rolling out the new and improved Bgee over the next weeks! Welcome to release 13!
In the 2 years since our last release, we have transferred all our annotations to the bilaterian ontology Uberon (see this paper for the ontology work) and made many other back-end changes which will make Bgee more powerful and better equipped to deal with the increasing diversity of species with RNA-seq data.
As a result of these changes, we can now provide expression calls for 17 species, up from 5 species in the previous release. The new species include model organisms such as C. elegans and chicken, as well as a diversity of tetrapods. For each species, we have developed a developmental ontology (ontologies on Google Code; developmental stage modeling collaboration on Github). As before, Present/Absent expression calls are a consensus of information from in situ hybridizations, microarrays, RNA-seq and ESTs, depending on the data available for each species.
In addition to the increase in species and in data, a major novelty in Bgee 13 is the change in usability. Fewer users want to browse a database through a webpage, as we provide since 2006. More users want to download a file, to analyze in R, browse in Excel, or include into their pre-existing analysis. And as our data become larger and more complex, browsing in a webpage is just counter-productive. From now on, our primary effort to make Bgee data accessible will thus be through customized downloadable files, starting with TSVs of precomputed results.
Because there are different needs for different users, each file will be provided in two versions:
- “simple files” will contain final calls, with minimum additional information; for Present/Absent expression calls, this means one line for the expression of a given gene in a combination of organ and developmental stage, and a call of confidence; easy to plug into your favorite analysis, i.e., are my metabolic disease genes expressed in human liver?
- “advanced files” will contain additional columns to provide detailed information for further parsing and analysis; for Present/Absent calls, we add the data type used and additional lines for extra propagated expression data (see below); ideal to start your more in-depth analysis, which you may want to restrict for example to only RNA-seq evidence, or only expression supported by two different data types.
What is the propagation of expression? When a gene is called expressed in, e.g., cerebelum, we can infer that it is expressed in brain because cerebelum is fully included in the brain. This is useful information when you need to recover all genes expressed in the brain, and thus is provided. This propagation is performed based on the relations in the Uberon ontology, and is specific to a species and a developmental stage.
What next? We will provide files of gene over-expression (e.g., this gene has significantly higher expression in adult liver than in other organs) and under-expression (e.g., this gene has significantly lower expression in juvenile muscle than in other juvenile organs) in each organ and developmental stage, based on microarray and RNA-seq data. We will provide files of orthologous gene expression in homologous organs between species, both pairs of model organisms (e.g., orthologs expressed in homologous tissues in human and fly), and taxonomic groups such as Primates or mammals (e.g., orthologs expressed in homologous tissues in all mammals). And we are working on new summary pages, for those who do not want only to download, but that’s for a bit later.