MOTIVATION

Multi-omics studies routinely deliver lists of genes in different functional contexts: differentially expressed, hypo/hyper methylated, mutated, amplified/deleted genes. Although biologically dependent (as related to the same phenomena), gene lists obtained by different experimental approaches or technologies may not significantly overlap. Novel BioProfiling framework provides statistical evidences that non-overlapping gene lists can be, in fact, significantly related if to consider available gene network information.

Novel functionality of BioProfiling can be useful for statistical analysis of biological data in various experimental set ups:

  • gene knocked-out (gene silencing) studies - Gene vs. Gene List
  • multi-omics studies - Gene List vs. Gene List
  • METHODS

    Gene vs. Gene List

    The core of our approach consist of statistical model to relate a gene (referred to as gene "a") to a gene list (referred to as list b) given a reference gene network and a reference set of genes used to select list b (referred to as list B). We want to pose a simple question: are genes from the list b located closer to the gene "a" on the network than it would be expected for a randomly sampled gene list (from list B) equivalent in size to the list b.

    • First, we compute the distances from gene "a" to all genes from list b using reference gene network. Distance is defined as a minimal number of steps required to get from one gene to another using edges of the network.
    • Second, we define the connectivity score Sab (between gene "a" and list b) based on the number of genes from list b having distance 1,2,3,. . .,n to gene "a".
    • Third, to find statistical significance of the connectivity score we implement Monte Carlo procedure. We sample randomly a gene list "r" (from a reference set) equivalent in size to the list b. We compute connectivity score Sr (between gene "a" and list "r"). We repeat the procedure N times (up to N= 10 000 if required) to find out the distribution Srj (j=1, 2, . . . , N) of the connectivity score between gene "a" and a random gene list (equivalent to the input list b). The significance (p-value) of the score Sab is computed as p = k/N, where k is a number of times the score Sab was less or equal to the scores from Srj distribution.
    • Figure 1. Statistical model to relate a gene "a" to a gene list (list b)

      Gene List vs. Gene List

      In case we need to link two gene lists (list a and list b) the "Gene vs. Gene List" procedure is repeated for each gene "a" from the list a. In this case we test a number of hypotheses (equals to the number of genes in list a) and need to apply standard FDR procedure to adjust computed by Monte Carlo procedure p-values for multiple testing.

      BioProfiling

      To start computations you need to provide:

      • gene list a
      • gene list b
      • Optional: list B (a reference set of genes used to select list b)
      If list B is not provided we assume that list B is all known genes.

      Reference gene networks

      BioProfiling is using two different reference gene networks: Reactome pathway database and Intact database of protein interactions. A reference gene network (external knowledge) is supposed to modulate the spreading of a signal in the cell from one gene to another.

      BioProfiling OUTPUT

      As output, the genes from the list a are ranked by significance (p-value) of the connectivity score in relation to the list b (see more details) . For each gene "a" from the list a with significant p-value the visualization of the network model is provided. Example is presented in figure 2.

      Figure 2. BioProfiling Output: Intact (protein interactions) network model of TAp73 knock-out (rectangle - gene list "a" (TAp73), circles - gene list "b" (upregulated genes), triangles - intermediate genes). The p-value ~ 0.02 indicates significant connectivity between TAp73 and upregulated genes. For more details see examples.

      Please note that you can produce how quality figures of the network models. Please read section "How to produce high quality Network Figures"