The contribution of #RNAseq, #microarrays, in situ hybridization and ESTs to #TopAnat gene enrichment signal

In Bgee, we integrate gene expression data from RNA-seq, Affymetrix microarrays, ESTs and in situ hybridization data. It is natural to think that with RNA-seq being so powerful, we should not bother with other sources of information.

Yet we still have an order much more data with microarrays: in Bgee release 13, we have in total:

  • 41 RNA-seq experiments, for 526 libraries.
  • 1170 microarray experiments, for 13070 chips.

In situ hybridization provides an amazing level of anatomical detail which is way beyond what other techniques can offer for now. And ESTs? Well let’s check.

In TopAnat, we can compute enrichment of gene lists for expression in anatomical structures (organs, tissues, cell types) based on the integration of all data, or only some subtypes:

So let’s try. Starting the example provided in “Quickstart” of “mouse genes annotated to GO term “spermatogenesis”. We expect these genes to be very tissue-specific (see also our recent analysis in Briefings in Bioinformatics), so it should be an easy case for each data type. We will choose one datatype at a time. To give each datatype its chance, we will perform these analyses with the lowest stringency: “Data quality” to “All”, no Decorrelation algorithm, and removing the FDR≤0.2 limit for reporting results.

Here are the results (click on images to go to the results on the TopAnat webpage):

topanat_spermatogenesis_RNAseq

With RNA-seq we obtain relevant organs, but very few. Essentially all the signal comes from testis, which is part of the male reproductive system, gonad, etc. The signal that we do get is very significant, which is reassuring.

topanat_spermatogenesis_microarray

With microarrays, we have more tissues and organs significant, with some more detailed structures. This is probably because we have many more experiments from microarrays, and thus some more detailed, than for RNA-seq. Again, statistics are good, and the organs reported are relevant. Don’t throw those microarray data quite yet! (Keep in mind though that Bgee only uses curated microarrays which are from healthy wild type and pass quality control.) On the other hand, notice that from our 457 genes, 442 were called expressed in gonad with RNA-seq, but only 402 with microarray: it is probable that lowly expressed genes were missed by the microarray experiments.

topanat_spermatogenesis_insitu

In situ hybridization gives us much more detailed structures, with very good statistical significance. Because we didn’t use any decorrelation, the results are difficult to read: a germ cell is a eukaryotic cell, and we get this information although it is not of great interest.

That’s a point in favor of using decorrelation for most analyses. For example, if we redo that analysis with “Weight”, which removes most of the signal due to the non independence of these structures (spermatocyte is a male germ cell, which is a cell, etc), we obtain: the following structures (FDR <0.2):

male germ cell; male reproductive organ; ooblast; hindgut diverticulum (mouse); gonad; testis sex cord; pharyngeal arch 2; membranous layer; meiotic oocytes (mouse); ventricular zone; entire extraembryonic component; otic pit; 1st arch maxillary component.

We see here the great level of detail obtained with in situ hybridizations. On the other hand, for each structure we have 13-28 genes only called present.

topanat_spermatogenesis_EST

Finally, for ESTs we obtain something similar to RNA-seq, although with less genes called present. It is noteworthy that there is so much biological signal in ESTs, although this type of data is nowadays largely neglected.

To wrap up this comparison, first the table of all structures called by at least one data type alone, FDR<0.2, using the Weight algorithm (links to the analyses: RNA-seq, microarrays, in situ, ESTs):

data type anatEntityName significant foldEnrichment pValue FDR
RNAseq testis 433 1.259 2.16E-30 4.08E-28
microarray seminiferous tubule of testis 351 1.411 7.91E-30 7.37E-27
insitu male germ cell 17 7.556 2.88E-11 2.15E-08
insitu male reproductive organ 19 3.506 3.01E-08 0.0000112
insitu ooblast 9 7.965 0.00000142 0.00267
insitu hindgut diverticulum (mouse) 10 5.495 0.0000122 0.01073
insitu gonad 114 1.555 0.0000171 0.01073
insitu testis sex cord 21 2.668 0.0000407 0.01914
insitu pharyngeal arch 2 23 2.312 0.000237 0.08915
insitu membranous layer 21 2.253 0.000431 0.13511
insitu meiotic oocytes (mouse) 8 4.324 0.000433 0.108
insitu ventricular zone 28 1.931 0.000623 0.16758
insitu entire extraembryonic component 34 2.457 0.000738 0.17353
insitu otic pit 10 3.311 0.000888 0.18575
insitu 1st arch maxillary component 21 2.111 0.00102 0.19113
EST male reproductive system 312 1.495 1.36E-35 6.70E-33

Second, the analysis with Weight algorithm, FDR<0.2, all data integrated:

topanat_spermatogenesis_alldata

Note how the integration of data types allows us to obtain both statistical power and anatomical specificity.

 

Take-home messages:

  • if you have good quality older data, don’t throw it, it contains good biology;
  • when you use TopAnat, integrate all data and use decorrelation.
Advertisements

About bgeedb

This is the blog of the database of gene expression evolution Bgee, at University of Lausanne and Swiss Institute of Bioinformatics. http://www.bgee.org/
This entry was posted in RNA-Seq, topanat and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s