This week, we searched for genes related to both schizophrenia and cell motility in order to build out gold standard of genes upon which to base our program off of.
My job was to find collections of genes related to Schizophrenia (SZ), while Miriam searched for cell motility genes. Our program is planned to have the capability to weight evidence for the strength of the data, so even uncertain genes will be really helpful.
The first set of associated genetic information is from the Schizophrenia Working Group of the Psychiatric Genomics Consortium. In a Genome Wide Association Study (GWAS) from 36,989 cases, they identified 108 loci that contained SNPs significantly more likely to be present in people with SZ. The threshold for significance is a p value of less than 5×10^-8, making this study one of the strongest sets of SZ gene data. However, while most of these SNPs are located near protein coding genes, all but 10 are located within non-coding regions. A regulatory region being near a protein coding gene does not necessarily mean that the gene is actually regulated by that region. Instead, the region might regulate some other gene in some other area. However, regionality with a regulator does imply a higher probability of being regulated by the regulator, so this locational information might be useful in the weaker evidence standards.
http://www.nature.com/nature/journal/v511/n7510/abs/nature13595.html?lang=en
The next set of data is from the Schizophrenia Research Forum on SZGene.org. It contains every genetic association study paper for SZ genes that’s available. It’s a nice collection, but none of the genes reached the GWAS p value threshold of 10^-8. This evidence will have to be weighted proportional to the strength of the studies themselves.
Another set of data comes from an expression study on human induced pluripotent stem cells. The cells were differentiated to neurons and underwent qPCR. This data, however, is in vitro, whereas the developmental aspect of SZ requires a degree of specificity and communication. https://www.nature.com/nature/journal/v473/n7346/full/nature09915.html
There is another gene expression dataset, but this time, it’s from dead people. While the region of the brain is much more specific (prefrontal cortex), this expression data comes long after development. However, they did find 1 gene, inhibited it in a pluripotent stem cell, and found abnormal cell motility. Therefore, this data may be useful to our study. http://www.nature.com/neuro/journal/v19/n11/full/nn.4399.html
The last genetic data set covers the conformational structure of the chromatin in the nucleus. Brain specific intrachromatin contacts can upregulate protein coding genes promoted by the contact. Thus, they established a strong correlation between gene expression and chromatin contact using 3 neonatal brain slices from separate subjects. They were able to locate the points of contact for the remaining 98 loci in the initial gene, giving a more broad picture of genetic interaction. http://www.nature.com/nature/journal/v538/n7626/full/nature19847.html
We’ll probably use all of these data sets. The next question is how.