Week 1: A Little Context

Week 1! We’re starting off the year, and our research, with some literature review to provide ourselves with a little context. Our proposed research plan mainly draws on two papers that laid out the methodological groundwork for us.

The first paper provides pertinent results and insights into the connection between cell motility and schizophrenia. The researchers concluded that cells in patients with schizophrenia are less adhesive and more motile than the cells of healthy control subjects, which was shown to improve with the inhibition of the focal adhesion kinase (FAK) protein. The results of this paper showed that there is a correlation between the altered motility of patient cells and dysregulated gene expression in the FAK signaling pathway within these cells. This paper provided us with the informational basis necessary to choose what kinds of genes we will shine our focus on – schizophrenia and migration association genes.

The second paper essentially laid out the framework for our methodology. In this paper, the researchers used a machine-learning algorithm to predict genes associated with autism spectrum disorder (ASD). They then validated these predicted genes experimentally using an independent-case sequencing study and were further able to demonstrate that this large set of ASD genes played roles in key pathways and brain development.

Pulling from both these papers, we plan to develop a network diffusion algorithm to identify candidate schizophrenia and migration associated genes. We are going to computationally build a list of predicted genes and (hopefully) validate them experimentally. However, before we do that, we need to delve into the nitty gritty of how the autism paper’s machine-learning algorithm was created.

In the paper, it outlines how the approach was based upon a human brain-specific gene functional-interaction network nicknamed GIANT (Genome-scale Integrated Analysis of gene Networks in Tissues). We began with a few initial questions: How is a functional-interaction standard set up? How was GIANT constructed? These questions will no doubt open up a whole new corridor of doors to explore; over the next couple of weeks, our goal is to investigate these questions through literature review and learn more about how we can utilize these tools in the coming year.

 

9/8/17

9/8/17

We are starting off the year neck deep in our literature review! First, we’re looking at how some researchers built the GIANT network, a functional interaction network built around tissue specificity 1. This type of network would be very useful to us given that we’re looking at neuronal motility genes, which needs to be specific to neurons since a ton of other cells move around.

How to Build a Tissue Specific Functional Gene Interaction Network

  1. Don’t build a tissue specific functional gene interaction network

    First, they built a “gold standard” of gene pairs that most definitely not tissue specific by selecting specific biological process gene collections from the Gene Ontology Consortium (GO). Gene pairs that were co-annotated to be functionally related were placed as the positive examples. Gene pairs that were not co-annotated to be functionally related were placed in the not-functionally-related-bin, unless they were: (A) in two different GO groups that had a significant number of shared genes, or (B) if the two genes were in GO groups that had nothing to do with each other. These gene pairs were ignored. They found 604,038 functionally related genes and 12,425,713 unrelated pairs.

  2. Ignore the 13 million pair list you just made

    Next, they matched Human Protein Reference Database gene to tissue annotations to the BRENDA Tissue Ontology. Tissues with less than ten genes were deemed worthless. They then grabbed a separate list of genes that are expressed ubiquitously and removed them from their respective sets of newly categorized genes and placed them into a “ubiquitous” bucket. Thus, they made a set of gene sets of genes expressed exclusively in specific tissues (T) and a set of genes expressed ubiquitously (U).

  3. Integrate the Two lists

    Greene et al, 2015

Going back to the tissue naive list of gene pairs, every gene was labeled depending on its tissue specificity. The gene pairs were then categorized according to each gene’s tissue specificity. The pairs with genes “specifically co-expressed in the tissue [T-T and T-U]” were marked as tissue specific, whereas the remaining gene pairs were marked as negative. The integrations were limited to 144 tissues with at least 10 positive tissue specific gene pairs.

4. Train a Bayesian Classifier for Each Tissue

They also trained a tissue naive classifier using the the original naive pair list. Since there was little independence, which is needed for Bayes classifiers, the dependency was calculated and accounted for. The classifiers then were able to make genome wide predictions about that specific tissue.

Hopefully I can learn more about the nuts and bolts of GIANT. These tools seem incredibly powerful, and might be used to find targets we might have never thought of. I will continue researching throughout the week.

  1. Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, Zhang R, Hartmann BM, Zaslavsky E, Sealfon SC, Chasman DI, FitzGerald GA, Dolinski K, Grosser T, Troyanskaya OG. (2015). Understanding multicellular function and disease with human tissue-specific networks. Nature Genetics. 10.1038/ng.3259w. ↩︎

Kicking Off the Collaborative REU

As the summer winds down and classes begin at Reed College, we are excited to begin a new project that sits at the intersection of computer science and biology.  With mentoring expertise on both sides of the aisle (Anna is a computer scientist, and Derek is a cell biologist), our interdisciplinary team will apply computer science techniques to predict potential players in disease.

The Biological Question: How is cell migration regulated in patients with schizophrenia?

Schizophrenia is a psychiatric disorder that affects how a person thinks, feels, and behaves, with potentially severe symptoms.  While we know that susceptibility of this disease runs in families, there are many mysteries about which genes, or “instructions” encoded in DNA, drive schizophrenia.  A paper recently demonstrated that cell migration patterns are altered in patients with schizophrenia – the cells become more motile and less “attached” compared to the same type of cells from healthy patients.  Since genes associated with cell migration have also been implicated in other diseases, we want to identify genes that may be potentially involved in altered cell migration and schizophrenia.

The Computational Approach: Machine learning to predict disease genes

While experiments can test whether a particular gene is associated with cell migration, we can’t simply test all 20,000 possible genes – it would take way too long, be way too expensive, and a vast majority of the experiments will be uninformative.  Instead, we will develop computational approaches to predict a small subset of candidate genes for further experimental testing.  These in silico experiments (which is just a fancy word for computer-simulated experiments) may not be incredibly accurate, but they will sure be fast!

How do we go about developing a computational method to predict candidate cell migration and schizophrenia-associated genes? As we’ll detail in future blog posts, we will search for these genes within large, publicly-available datasets.  We will build a list of the genes that are known to be associated with cell migration or schizophrenia, and then look for other genes that have similar properties to the known genes.  This general technique is called machine learning, where we design instructions for a computer to make predictions.  In our case, we wish to predict whether an unknown gene could be associated with cell migration, schizophrenia, or both.

Experimental Validation: Testing the computational predictions

An important aspect of computational biology research is to experimentally test the predictions to see if we discovered new players involved in schizophrenia and cell migration. In Derek’s lab, the team will test the top candidates in two ways.  First, will see whether each candidate gene affects cell migration in fly cells by “knocking down” the gene product in the cells and observing the change in cell movement.  Next, we will take the top candidates from the first step and observe migration patterns in fly neuroblasts (cells that are destined to become neurons). From these experiments, candidate genes that alter migration patterns in fly neuroblasts may affect neuron cell migration in humans.

There is lots to learn and lots to do!  It will be a fun year – stay tuned.