Expressed, or not expressed? That is the question!

Bgee provides information about where and when genes are expressed in different species. But what does it mean “to be expressed”? That’s a fundamental question we had to answer before we could introduce RNA-Seq data into the database.

Expression does not equal transcription. Expression is the process during which information encoded in the DNA sequence is transformed into a functional product.  It means that every DNA sequence that is transcribed is not necessarily expressed. Take introns, for example. They are transcribed but the information they contain is not used to form a protein.

To define where and when a gene is expressed based on RNA-Seq data, we thus decided to use intergenic regions, potentially transcribed but mainly unexpressed fragments of the genome, as a reference. If we take the presence of at least one uniquely mapped read as a criterion for transcription, then more than 50% and more than 60% of the human and mouse intergenic regions, respectively, have been transcribed in at least one of the RNA-Seq samples we introduced so far into Bgee. In order to avoid an excess of false positive calls, we set a cutoff value on the transcription level at which we define a gene as expressed. This cutoff value is set so that any genomic feature whose transcription level is above the cutoff has a probability less than 1:20 to be an intergenic region.

Using this cutoff to analyze the RNA-Seq data in Bgee results in less than 15% and less than 20% of mouse and, respectively, human intergenic regions being defined as expressed in any of the analyzed tissues. On the other hand, more than 80% and more than 90% of mouse and human protein coding genes are at least once characterized as expressed. Moreover, more than 50% of the human protein coding genes and more than 40% of the mouse protein coding genes are ubiquitously expressed in the different RNA-Seq samples now in Bgee.

As of today Bgee contains 33 RNA-Seq libraries from the experiment GSE30352 representing gene expression data from 7 human organs (frontal lobe, temporal lobe, cerebellum, heart, kidney, liver and testis) as well as from 6 mouse organs (brain, cerebellum, heart, kidney, liver and testis). But that’s just a start. We’re planning to add more species in the future. So stay connected!



About bgeedb

This is the blog of the database of gene expression evolution Bgee, at University of Lausanne and Swiss Institute of Bioinformatics.
