Week 2: Sinksource+ and AUCs

We found an algorithm that solves the edge case problem. Sinksource+ removes the negatives and adds one negative to every single node. The weight of the edge is also a constant, which means that it’s the same for a node of any degree. This significantly devalues nodes that aren’t very well connected and would not be suitable for polygenic analysis.

However, it works horrendously in a single layer method with no prime nodes. The AUC sticks at around 0.5, which is the worst possible AUC you can have. However, it works great in a two layer network, giving our SZ network a score of 0.65. If you add negatives back in with the addition of sinksource+ and reduce the sinksource+ constant, we get an SZ AUC of 0.675 and a cell motility AUC of 0.801, which is the best AUC we have seen for an SZ gene identifier. I’m also running the program on the autism positives right now, and it appears that the AUC is running at about similar levels to where they were with E1 and E2.

Figure 2

Krishnan, A. et al.Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nature Neuroscience19,1454 (2016).
I think we are at the point where we will begin writing the paper. These results are a satisfying in that it’s probably just the nature of SZ that genes implicated are going to be difficult and somewhat disparately related. That’s not necessarily a bad thing. Polygenic positives are by definition only contributing slightly towards a disease. The strongest p-value for a SNP in an SZ GWAS has a frequency of 0.765 in SZ participants and a frequency of 0.75 in control participants. A low AUC in an SZ gene labeler is almost expected.
Goals for next week: Write the paper