Figure 1. Comparison of the neutrophin signaling pathway between Alzheimer's disease BEL model and Canonical KEGG pathway. Red edges represent pathological state and green edge normal state. Amyloid beta associates with NGFR to inhibit neuron survival signals leading to neuron death in Alzheimer's (Kodamullil et al., 2015 & Kanehisa and Goto, 2000).
A short introduction to BEL can be find here link. BEL is a language especially designed to represent scientific knowledge in a computable form by capturing causal and correlative relationships in context. In the neurodegenerative disease field, BEL is able to store additional information such as which of relationship exists between the biological entities acting, evidences supporting this relationship in the literature and many other specific annotations such as experiment conditions. Besides, BEL facilitates integration of multiple data types through its flexible and human readable syntax. Therefore, we found BEL ideal to build the models that made the core of NeuroMMSig.
SET Citation = {"PubMed", "J Mol Med", "12682725"}
SET Evidence = "high-dose steroid treatment decreases vascular inflammation and ischemic
tissue damage after myocardial infarction and stroke through direct vascular effects involving the
nontranscriptional activation of eNOS"
SET Species = "9606” #Taxonomy ID- Homo sapiens
SET Tissue = "Vascular System”
SET Disease = "Stroke“
SET Subgraph = “Inflammatory response subgraph”
a(CHEBI:corticosteroid) -| bp(MESHD:"Inflammation")
After data submission, NeuroMMSig enrichment ranking algorithm calculates an enrichment score for the data-mapped subgraphs. The enrichment ranking algorithm returns a list of subgraphs with their correspondent scores and metadata information. Following, more details about the algorithm.
The enrichment algorithm evaluates a score given three different scores and their corresponding weights (equation 1). Weights are defined on a zero to one range and they can be set in the submission form of the NeuroMMSig server.
$$s=w_{1}s_{1}+w_{2}s_{2}+w_{3}s_{3}$$In similar manner to pathway analysis methods — Over-representation analysis (ORA) in particular — measuring the percentage of mapped nodes is a common approach to calculate the enrichment of a network (Khatri et al., 2012). The higher the number of data that is mapped to nodes is, the greater the score. It is important to mention that not all nodes in the subgraphs can be mapped. For instance, entity types such as biological processes, chemicals or pathologies cannot be easily associated with mechanisms or pathways. Therefore, since they are not associated with subgraphs, only entities like gene sets and their corresponding associated imaging features and SNPs are taken into consideration when calculating the proportion of nodes that can be mapped.
$$s_{1} = \frac{\#Matches}{\#PossibleMatches}$$Score 1 example: An user submits two genes as an input (DDR2,CRYAB) that map to this imaginary subgraph (graph below). In this case, the score 1 would be 1 since there are only two mappeable nodes (DDR2,CRYAB). The other two nodes (inflammatory response and innate immune response) are biological processes and cannot be mapped with data. If only one of the two genes is submitted, the score would be 0.5 (1/2).
$$s_{1} = \frac{2 (mapped)}{2 (possible mappings)} = 1$$The second score aims to assign weights to nodes depending on their contribution to graph topology — following a similar strategy of PT-based methods. We compare different methods described in the literature and betweenness centrality was found a good indicator of node importance since it reflects the amount of control that this node exerts over the interactions of other nodes in the network. This approach of measuring node importance in biological networks coincided with other studies in the literature (Gu et al., 2012 and Joy et al., 2005).
NeuroMMSig considers then the top 5% of nodes in each subgraph with highest betweenness centrality as hubs. The calculations were performed removing all pathology nodes. Since pathology nodes are highly connected in the networks, their presence distorted shortest paths calculations which are the basic principle of betweenness centrality. When a network was smaller than 20 nodes, there is no ‘node-weighing’ based on hubs. As a conclusion, when data is mapped to any of these hub nodes, enrichment score for its correspoding subgraph is increased by the proportion of mapped hubs. As an example, if a network has two hubs and one of them is mapped, the value of the score accounting for hubs is 0.5.
Score 2 example
An user submits two genes as an input (DDR2,CRYAB) that map to this imaginary subgraph (graph below). Let's suppose that the betweenness centrality analysis in this subgraph ranks the CRYAB as the only hub. Then, score 2 would be 1 since there is only one hub and it has been mapped (1/n when is the total number of hubs in that subgraph). Please note that this is a simple illustration that would not apply since NeuroMMSig only calculates this score for graphs over 20 nodes (top 5% nodes are considered as hubs).The last score accounting for the final enrichment score is based on whether the nodes that have been mapped are close to each other and thus, their connectivity. Since mechanisms are chain of cause and effects (a collection of nodes and edges in subgraphs) one would expect a better enrichment score for a series of mapped nodes that are closely together than series where nodes are disperse between them. Therefore, this score assigns higher scores to neighbour nodes rather than spread ones. For that we used equation 2, where NG[i] is the number of mapped nodes that are also neighbours of node i and n is the total number of nodes mapped. Here, it is important to remark that because BEL implies directionality, networks are considered as directed graphs. As a consequence, when calculating this score from node 1 to node 2 (1->2), node 1 is a neighbour of 2 but not the other way around because one cannot traverse from node 2 to 1 in a direct graph. The denominator n(n-1) intends to normalize the score to 1 since n(n-1) are the total of possible connections in a directed graph.
It is important to mention that we do not consider BEL graphs multi-edge property. Thus, there would be no difference in the score if they are multiple edges going from node 1 to node 2, or only one edge. Summarizing, the score only considers the presence or not of an edge between nodes, not the type/number of them.
$$s_{3} = \frac{\sum_{i}^{n} N_{G}[i]}{n(n-1)}$$Score 3 example
An user submits two genes as an input (DDR2,CRYAB) that map to this imaginary subgraph (graph below). In this subgraph, both genes are connected with one edge (DDR2 decreases CRYAB). In this case, the score 3 would be 0.5, since there is only one edge between the mapped nodes and the total of possible edges between them is 2 (n=2). $$s_{3} = \frac{1}{2(2-1)} = 0.5$$ The final score when submitting these two genes (DDR2,CRYAB) taking into consideration the examples above would be: $$s=w_{1}1+w_{2}1+w_{3}0.5$$“A chain of causes and effects forms a pathophysiological context, where minor dysregulation of molecular events may aggregate at a network level and lead to a pathological deviation from the normal state (Hofmann-Apitius et al., 2015)".
Once data is mapped to the subgraphs, we can identify the different ways the data-mapped nodes dysrupt a particular node of interest such a biological process. For more detail about how NeuroMMSig might identify possible dysregulated paths in the networks, please visit "How to use NeuroMMSig" section.
Gu, Z. et al. (2012) Centrality-based pathway enrichment: a systematic approach for finding significant pathways dominated by key genes. BMC systems biology 6.1: 56.
Joy, M. P. et al. (2005) High-betweenness proteins in the yeast protein interaction network. BioMed Research International. 2: 96-103
Kanehisa, Minoru, and Susumu Goto. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research 28.1 : 27-30.
Khatri, P. et al. (2012) Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol 8.2 e1002375
Kodamullil, A. et al. (2015) Computable cause-and-effect models of healthy and Alzheimer's disease states and their mechanistic differential analysis. Alzheimer's & Dementia 11.11 : 1329-1339.
Martin Hofmann-Apitius et al. (2015) Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders. eng. In: Int J Mol Sci 16.12, pp. 29179–29206. doi: 10.3390/ijms161226148. url: http://dx.doi.org/10.3390/ijms161226148