Week 4: Exploring Data

I spent this week developing an understanding of the command line and version control, as well as exploring the data gathered from the TCGA cancer genome data.

First, I discovered that an application I had downloaded to make my life easier (Anaconda) had deleted python 2.7 from my machine and wouldn’t allow me to run things appropriately. This was a huge problem because currently, I need to run programs using python 2.7 and networkx 1.9.1 for dependencies to work out. So, I deleted the installation files and made sure that all of the caches and file folders they resided in were deleted, and then used Homebrew to reinstall python 2 and python 3.  The moral of this story is to always understand what you’re installing.

Secondly, I developed the framework for the bar graph that we will eventually use to analyze which genes are present in each of the datasets we are looking at. We are measuring the number of genes with different levels of gene expression(high, low, or none) in each of the datasets that we have. Each dataset will then have 3 different bars expressing this data.

We’ll be using  the TCGA Genes database,  the number of genes from the PathLinker-2015 interactome, PathLinker-2018 interactome, the Wnt Pathway from NetPath, the PathLinker-2015 interactome’s top 1000 paths, the PathLinker-2018 interactome’s top 1000 paths, and finally the Localized PathLinker-2015 interactome’s top 1000 paths, as well as the Loc_PL 2018 interactome’s top 1000 paths.