Week 3: Processing Data from FireBrowse and PathLinker

On Monday we all met with Anna to agree on some goals for the week. Kathy and I set out to acheive several things: A. to figure out how to download data from FireBrowse on colon adenocarcinomas, to perform several statistical analyses on that data and to visualize it in some form using pylab, B. to write a code that produced informative graphs of the different signaling pathways from text files containing information about the edges and nodes and C. to implement PathLinker and LocPL and to create graphs of the top k paths on GraphSpace.

Originally to acheive our first goal, we tried to use Nicholas Egan’s pepper pathway code to download and format the data. However, while we were able to decipher how it worked, it ended up not being all that useful to us, so Kathy downloaded all the gene files related to colon adenocarcinoma onto her computer . Unfortunately, we’re still not really sure how to use them. We wrote a code that took text files containing information on edges and nodes and spit out a graph with color-coded nodes that were labeled with the node name and type and edges that were labeled with the type of interaction that was occuring between the two nodes for example “physical” or “phosphorylation”. I have attached to examples of the graphs we produced below, one very complex and the other very simple.

Left: Alpha-6-Beta-4 Integrin Graph. Right: Interleukin-7 Graph

We also ran PathLinker on the Wnt pathway and made a graph of the top 200 best paths produced.

Top 200 Wnt Signaling Pathways

Moving forward we have several goals. Kathy is trying to figure out how to implement LocPL. I am still trying to figure out how to process and use the FireBrowse data. Because I will need to integrate methylation data from FireBrowse in my project, it is essential I figure out how to use it.

Overall, this week I have learned a lot about data processing, writing code to read text files, and using NetPath, PathLinker, FireBrowse, and GraphSpace. These skills should come in handy as I move forward in my project.

Week 2: Understanding Wnt/β-catenin Signaling and the Ryk-CFTR-Dab2 Path

This week, Kathy, Usman, and I split up to learn more about different aspects of PathLinker. I researched the Wnt/β-catenin signaling pathway, precision-recall curves, and the role of CFTR and Dab2. On Wednesday, we all presented the topics we had researched to each other and Anna and Ibrahim.

To briefly summarize what I learned, the Wnt/β-catenin pathway regulates the transcription of certain genes related to cell proliferation, cell attachment, and growth. When the Wnt/β-catenin signaling pathway is dysregulated, a number of pathologies can develop including cancer and heart disease. In fact, in a study on colon adenocarcinoma by the Cancer Genome Atlas, 93% of tested tumors had a mutation that affected the Wnt/β-catenin signaling pathway (TCGA, 2012). At the most basic level theWnt/β-catenin signaling pathway is turned on when Wnt proteins bind to “frizzled” a 7-pass transmembrane receptor which halts the destruction of β-catenin in the cell. Usually, when the pathway is off, β-catenin is constantly being produced and destroyed. When β-catenin destruction is interrupted, β-catenin will build up in the cytoplasm and move into the nucleus where it will bind to LEF and the TCF promoter to promote the transcription of specific genes that were previously being inhibited by a transcription factor that was bound to TCF. The Wnt/β-catenin signaling pathway involves many proteins and interactions, some of which we still do not understand. The PathLinker algorithm succesfully identified a path in the Wnt/β-catenin signaling pathway that was not in the KEGG or NetPath database: the Ryk-CFTR-Dab2 path (Ritz et al, 2016).  The authors of the paper hypothesized that Ryk would interact with CFTR (Cystic Fibrosis Transmembrane-conductance Regulator), a chlorine ion channel previously studied for its role in cystic fibrosis, which would activate Dab2 to inhibit β-catenin activity. This hypothesis was experimentally confirmed by silencing Ryk, CFTR and Dab2 individually using RNA interference and then testing transcription levels of β-catenin controlled genes using a TCF/LEF luciferase activity and levels of β-catenin in the cell using a Western Blot assay.

We also attended a data management workshop taught by David Isaak, Reed’s data science librarian to learn how to use Git and GitHub. Because I’ve never really used terminal before (I usually use repl.it), I worked through a command line tutorial to learn more about it.

On Thursday, Kathy, Usman, and I went through our Dijkstra code with Anna, and finally got it to work the way we wanted it to. Anna added us to the cancer-linker repository on GitHub so we can all begin to actually work on PathLinker now.

My next project is to figure out how to take data from FireBrowse, a database containing data from TCGA on 38 different cancer types, put it into a format we can use, and apply several statistical analyses to the information. I’m hoping to base it off of PepperPathway, a program written by Nicholas Egan, a student in Anna’s lab last summer, that uses data from FireBrowse, GeneCards and NetPath and to create a visual representation on GraphSpace. I’m struggling to make sense of the PepperPathway program because there are a lot of files on the GitHub repository and I’m not really sure where to begin. I’m going to try to make sense of it this weekend.

References

  1. TCGA Research Network.  Comprehensive Molecular Characterization of         Human Colon and Rectal Tumors.  July 19, 2012. Nature. DOI:           10.1038/nature11252.
  2. Ritz, Anna et al. “Pathways on Demand: Automated Reconstruction of  Human Signaling Networks.” Npj Systems Biology And Applications 2 (2016): 16002. Web.

Week 1

This is Kathy, Usman, and my first week working on our summer research projects which all involve modifying the PathLinker algorithm. For my project, I am interested in integrating data about protein methylations related to colon adenocarcinoma from the FireBrowse database into PathLinker. Protein methylation, which is only one of many ways that cell signaling pathways can be altered in cancerous cells, can either inhibit or enhance protein-protein interactions. I intend to devise a way to use this information to change the weights of edges connected to methylated proteins to more accurately model cancerous cell signaling pathways. I am hoping this method could eventually be generalized so that the algorithm could be modified for multiple types of cancerous mutations. I believe that this work could be important in identifying important proteins for further research.

So far, we have mostly been trying to learn about pathway reconstruction methods before we dive in. To do this, we have been re-reading the original PathLinker paper as well as a more recent paper about an alteration to PathLinker that allows it to integrate protein localization information.

We have also been working on understanding and implementing two algorithms that calculate the shortest path from a source to a target in a graph: Breadth First Search and Dijkstra’s algorithm. Breadth-first search is an algorithm that finds the path with the fewest number of edges from a source to a target. Dijkstra is more advanced and relative to our work because it takes the weight of the edges between nodes into account. Dijkstra will find the path with the lowest sum of edge weights from the source to any node in the graph.

This weekend and this coming week I have several goals. Primarily, Usman, Kathy and I need to figure out how to get the separate pieces of code that we wrote for the Dijkstra algorithm (which all work individually) to work together to first read a text file, then run Dijkstra’s algorithm, and finally produce a visual graph that shows the best paths. I need to do more research about the original PathLinker to gain a better understanding of precision-recall curves which were used to evaluate its efficacy, Ryk-CFTR-Dab2, which is a new path that was discovered in the Wnt/β-catenin signaling pathway by PathLinker, and the experimental methods that were used to verify the importance of CFTR.