In a recent post on this blog we saw how to analyze results from a breast cancer GWAS. In that case, we did not have very strong expectations of tissue-specificity for the genes; it was more of an exploratory analysis.
This time, let’s do the same but searching GWAS Catalog for the term “autism”. Here we have a clear expectation of finding genes expressed in the brain.
Using the same methodology as for breast cancer, we find 87 genes with a significant SNP inside the gene, and we obtain the following in TopAnat:
We notice that the top terms, ranked by FDR (the default), are not necessarily brain related. Other the other hand, looking down the list we notice some higher fold-changes. Ranking by “Fold enrichment” (just click on the head of column), we get:
Now all the top terms are parts of the brain! In case you are wondering what the paraflocculus is, you can click on the Uberon ID and get to the definition: it’s a cerebellar
What this illustrates, and is well known in other contexts, is that the p-value (and its close friend the FDR) only takes you so far, since it is so dependent on sample size. The top terms by FDR have very large numbers of genes called expressed in them (≈20k!), whereas the more specific brain parts have a few thousand “only” genes expressed.
Thus we have a strong biological signal in the patterns of gene expression, but we have to be weary of relying on p-values. As always in statistics. 😉
Update: see also this more recent post on autism and epilepsy genes.