Hypergraph Algorithms

Hey everyone! I am Alex Richter. I am a rising junior in math and computer science working with Anna Ritz this summer on hypergraph algorithms. The algorithms that I implemented this summer were from a paper co-authored by Nick Franzese, Anna Ritz, and Adam Groce [1]. The implementation of the algorithms already existed in Python; however, I learned Java and implemented them in Java to be incorporated into a plugin written in Java. 

We are working on incorporating the algorithm into the application Cytoscape via a plugin called ReactomeFIViz. In doing so, we hope that scientists will begin to examine hypergraph topology when checking to see if two molecules are connected within a cell. In viewing a different topology as opposed to simply directed or bipartite graphs, different pathways can be discovered as well as some nodes in a graph topology might be connected within a cell but not in a hypergraph topology. The potential for hypergraph connectivity in better understanding connectivity within a cell is a field that deserves to be further explored. 

The algorithm that I implemented relaxes the definition of B-connectivity and determines if two nodes are connected in a hypergraph topology. Here are some definitions from the paper: 

  • Directed hypergraphs represent reactions with many-to-many relationships, where each hyperedge e = (Te, He) has a set of entities in the tail Te and a set of entities in the head He (here, the tail denotes the hyperedge inputs and the head denotes the hyperedge outputs). [1]
  • B-connectivity: requires all the nodes in the tail of a hyperedge to be visited before it can be traversed. This definition has a natural biological meaning in reaction networks: B-connectivity requires that all reactants of a reaction must be present in order for any product of that reaction is reachable. Unlike the compound graph rules, B-connectivity describes the strictest version of connectivity, and it is the only rule used to traverse hyperedges. [1]
    • Given a directed hypergraph and a source set SV, a node uV is B-connected to S if either (a) uS or (b) there exists a hyperedge e = (Te, He) where uHe and each element in Te is B-connected to S. We use to denote the set of nodes that are B-connected to S in . [1]

To be B-connected implies that in order to traverse a hyperedge all of the nodes in the tail set (input set) must be present. B-connectivity can be a little bit difficult to understand at first glance. It is a harsh restriction on hypergraph connectivity. Two nodes are B-connected if one of two things is true: either both nodes are in the source set so obviously they are B-connected or the node we are seeing if is B-connected is located in the head (output) set of a hyperedge that where each element in the tail (input) set is connected to original node. 

b_visit Algorithm 

The algorithm that I implemented is two parts. The first algorithm is called b_visit and takes in a hypergraph and a set of source nodes, s. The b_visit algorithm will then return three sets: the set of traversed hyperedges from s, the set of B-connected nodes to s, and the set of restricted hyperedges. Restricted hyperedges are edges that can be reached but not traversed because it is missing an input in the tail to make the hyperedge traversable. For example, given a source set of A, B for this hypergraph, the set of restrictive hyperedges would be: 

Tail Set Head Set 
G, DL
D, H M
I, N O
Set of Restrictive Hyperedges

The set of traversable hyperedges would be: 

Tail Set Head Set 
AC
A, BD
BE
BF
E, FI
Set of Traversable Hyperedges

The set of B-Connected nodes to A, B would then be: C, D, E, F, I. 

Image taken from paper [1].

b_relaxation Algorithm 

The b_visit algorithm helps when trying to find out which nodes are B-connected to the source set, or the set of nodes the algorithm is called on. This is helpful to find those immediate nodes; however, since the requirements for B-connectivity are so strict, this does not include all of the nodes that can be reached from the source set in the hypergraph. Through calling b_visit multiple times and replacing the requirements for B-connectivity, more nodes can be viewed as connected in the hypergraph. 

The second algorithm from the paper is called b_relaxation which makes calls to the b_visit algorithm. The input for b_relaxation is once again the hypergraph and a set of source nodes, s, on which the algorithm will be called. Through multiple calls to b_visit, b_relaxation relaxes the strict conditions of B-connectivity. The algorithm runs on the entire hypergraph to determine which hypernodes are B-connected to the source set. The return value of the algorithm is a dictionary of k-distances where the key is the node and the value is the distance. K-distance is easiest understood as the number of times the algorithm was run until it could reach the node in question. So if it was in the source set for instance, the k-distance for the node would be 0. In relaxing the harsh conditions of B-connectivity, the algorithm is essentially traversing restrictive hyperedges to find all of the nodes that are connected in the hypergraph. 

Using the same toy example, at the end of b_relaxation, it would return a dictionary of distances. 

KeyValue
A0 (source set) 
B0 (source set) 
C0
D0
E0
F0
GINFINITY
HINFINITY
I0
JINFINITY
KINFINITY
L1
M1
NINFINITY
O1
P2
Q2
R3
Dictionary of distances for toy hypergraph after running b_relaxation algorithm
Image take from paper [1].

Anna and I are working with our collaborators in incorporating this into ReactomeFIViz before adding it to Cytoscape. The next steps include cleaning up the code, making sure it works properly in the plugin, and considering implementing further hypergraph algorithms. 

References: 

[1] Franzese N, Groce A, Murali TM, Ritz A (2019) Hypergraph-based connectivity measures for signaling pathway topologies. PLOS Computational Biology 15(10): e1007384. https://doi.org/10.1371/journal.pcbi.1007384