Trial and Error and Error and Error: Modelling bacterial protein interaction networks in Graphspace

Hi, my name is Henry Jacques (he/him). I’m a class of 2024 biochemistry/molecular biology major here at Reed, and I spent this summer working with Max Bennett to (try to) develop a method of visualizing protein-protein interactions related to a target protein in bacteria with a tool called Graphspace (

Though a stated goal of ours was to develop a program that worked across many species of bacteria and with many target proteins, we focused on Mn- and Zn-transporting molecules called ATP binding cassette transporters (ABC transporters) in B. subtilis, E. coli, S. aureus, and S. lugdenesis. These proteins are understood to play a vital role in metal ion homeostasis across many bacterial species, many of which are pathogenic. This focus was done at the direction of both Anna Ritz and Dr. Shivani Ahuja in the Chemistry department, who has taken a keen interest in the function and structure of these proteins.

We developed an algorithm to interpret data from individual bacterial species’ datasets on STRING, a database compiling protein-protein interaction data from research- and homology- based data sources. Data was taken as pairs of STRING identifiers signifying individual protein linkages and translated using a combined list of linkages between STRING identifiers and ‘common’ names for ease of interpretation in the final graph.

A secondary structure of these graphs was comparing the homologous interactions across species, which was done using homology data embedded in STRING as well as data from Clusters of Orthologous Groups (COG) mappings. These linkages were challenging to filter in terms of confidence, as STRING’s indication of homology leaves a lot to be desired in terms of ease of use. Nevertheless, we believe that the homologous interactions present could provide a useful starting point for future biochemical investigations using known homology data.

Challenges and next steps
Perhaps the most cumbersome part of this project was reconciling the various naming conventions used for proteins across various interaction databases. STRING conventions were the primary identifiers used in internal recognition of protein-protein interactions, though many STRING identifiers were unmapped to more ‘common’ protein identifiers, thus making the data difficult to use for any biochemical interaction investigation without first translating the STRING identifiers.

Another challenge we encountered was that STRING’s dataset was mostly composed of non-experimental evidence, meaning that interactions inferred from such database were not necessarily supported by in vitro analysis in laboratory conditions. In some instances, interactions known to exists biochemically were not present as discrete data within the relevant STRING dataset, and had to be manually added for creation of the final graph. However, this lack of data presents one significant point at which even the incomplete data can be extrapolated for useful biochemical results: with further configuration of the graph generation, undiscovered interactions between known proteins with homologous known interactions in similar species can be hypothesized to exist, possibly providing a starting point for future biochemical research.

There’s certainly a lot more to be done with this project — whether it’s future me or someone else who tackles it, I hope it goes well ­čÖé