NSCCN/bioconductor-metapone：用于代谢通路富集分析，识别生物样本中显著富集的代谢通路

Metapone

The metapone package conducts pathway tests for untargeted metabolomics data. It has three main characteristics: (1) expanded database combining SMPDB and Mummichog databases, with manual cleaning to remove redundancies; (2) A new weighted testing scheme to address the issue of metabolite-feature matching uncertainties; (3) Can consider positive mode and negative mode data in a single analysis. The two databases are at:

https://smpdb.ca/

https://shuzhao-li.github.io/mummichog.org/software.html

Metapone can be install by calling:

devtools::install_github("tianwei-yu/metapone").

To use the package, you need to have testing results on untargetted metabolomics data ready. The test result should contain at least three clumns - m/z, retention time, and feature p-value. An example input data can be seen here:

> library(metapone)
> data(pos)
> head(pos)


       m.z retention.time    p.value statistic
1 85.04027       55.66454 0.22109229 -1.231240
2 85.07662       56.93586 0.52181695 -0.642790
3 85.57425      125.97117 0.13483680 -1.507372
4 86.06064      194.81306 0.26069118  1.131101
5 86.08001       54.74512 0.17206535  1.375352
6 86.09704      177.73650 0.07541608  1.796427

The package can use data from a single ion mode. But it can also handle the situation where data are collected on the same samples using both modes. If both positive mode and negative mode data are present, each is input into the algorithm as a separate matrix.

> data(neg)
> head(neg)
               mz       chr      pval t-statistic
result.1 85.00448 268.83027 0.2423777   1.1690645
result.2 87.00881  48.84882 0.2222984   1.2204394
result.3 87.92531 161.99560 0.1341622   1.4978887
result.4 88.00399 129.88520 0.2941855  -1.0489839
result.5 88.01216  35.81698 0.8510984  -0.1877171
result.6 88.98808 127.47973 0.1748255  -1.3568608

The test is based on HMDB identification. The common adduct ions are pre-processed and stored in:

> data(hmdbCompMZ)
> head(hmdbCompMZ)
      HMDB_ID      m.z ion.type
1 HMDB0059597 1.343218     M+3H
2 HMDB0001362 1.679159     M+3H
3 HMDB0037238 2.341477     M+3H
4 HMDB0005949 3.345944     M+3H
5 HMDB0002387 4.011337     M+3H
6 HMDB0002386 4.677044     M+3H

Pathway information is built-in. It combines SMPDB and Mummichog databases. The two databases can be found here:

https://smpdb.ca/

https://shuzhao-li.github.io/mummichog.org/software.html

Here is the combined database after manual pruning of some highly-overlaping pathways:

> data(pa)
> head(pa)
      database         pathway.name     HMDB.ID KEGG.ID  category
1 Metapone 191 Pterine Biosynthesis HMDB0006822  C05922 Metabolic
2 Metapone 191 Pterine Biosynthesis HMDB0002111  C00001 Metabolic
3 Metapone 191 Pterine Biosynthesis HMDB0006821  C05923 Metabolic
4 Metapone 191 Pterine Biosynthesis HMDB0000142  C00058 Metabolic
5 Metapone 191 Pterine Biosynthesis HMDB0015532         Metabolic
6 Metapone 191 Pterine Biosynthesis HMDB0001273  C00044 Metabolic

The user can specify which adduct ions are allowed by setting the allowed adducts. For example:

> pos.adductlist = c("M+H", "M+NH4", "M+Na", "M+ACN+H", "M+ACN+Na", "M+2ACN+H", "2M+H", "2M+Na", "2M+ACN+H")
> neg.adductlist = c("M-H", "M-2H", "M-2H+Na", "M-2H+K", "M-2H+NH4", "M-H2O-H", "M-H+Cl", "M+Cl", "M+2Cl")

It is common for a feature to be matched to multiple metabolites. Assume a feature is matched to n metabolites, metapone weighs the feature by (1/n)^p, where p is a power term to tune the penalty. n can also be capped at a certain level such that the penalty is limited. These are controlled by parameters:

Setting p: fractional.count.power = 0.5

Setting the cap of n: max.match.count = 10 (this is to cap the level of penalty on a multiple-matched feature.

Other parameters include p.threshold, which controls which metabolic feature is considered significant. The testing is done by permutation. Overall, the analysis is conducted this way:

> dat <- list(pos, neg)
> type <- list("pos", "neg")
> r<-metapone(dat, type, pa, hmdbCompMZ=hmdbCompMZ, pos.adductlist=pos.adductlist, neg.adductlist=neg.adductlist, 
     p.threshold=0.05,n.permu=100,fractional.count.power=0.5, max.match.count=10)
> hist(ptable(r)[,1])

We can subset the pathways that are significant, and with a good number of metabolites matched:

> selection<-which(ptable(r)[,1]<0.025)
> ptable(r)[selection,]

                                                p_value n_significant metabolites n_mapped_metabolites n_metabolites
tRNA Charging: Isoleucine                          0.01                 0.3960824            0.6633436             4
tRNA Charging: Leucine                             0.01                 0.3960824            0.6633436             4
Drug metabolism - cytochrome P450                  0.01                 1.5796691            6.1259660            50
Vitamin A (retinol) metabolism                     0.01                 1.3333333            3.2871677            19
Fatty acid oxidation, peroxisome                   0.02                 1.2886751            4.1875417            26
Glycosphingolipid metabolism                       0.02                 2.8395164           13.9627029            45
Leukotriene metabolism                             0.01                 1.5344457            6.3645097            14
C21-steroid hormone biosynthesis and metabolism    0.00                 4.8363062           28.3710206            80
Lysine metabolism                                  0.02                 1.8848575           11.3149960            33
Androgen and Estrogen Metabolism                   0.00                 2.7672612           13.2744270            63
Glycine and Serine Metabolism                      0.01                 3.8434006           28.9817190            56
beta-Alanine Metabolism                            0.02                 2.8133149           16.5311037            42
Porphyrin Metabolism                               0.01                 3.1213203           12.8947329            47
Purine Metabolism                                  0.02                 4.6733053           31.7643456            91

(some columns omitted)

> ftable(r)[which(ptable(r)[,1]<0.025 & ptable(r)[,2]>=2)]

NSCCN/bioconductor-metapone：用于代谢通路富集分析，识别生物样本中显著富集的代谢通路Glycosphingolipid metabolism`
     m.z      retention.time p.value    statistic HMDB_ID       m.z      ion.type  counts   
[1,] 380.2564 517.4015       0.02576156 2.263223  "HMDB0000277" 380.256  "M+H"     0.2672612
[2,] 282.2793 522.1412       0.03861876 2.095591  "HMDB0001551" 282.2791 "M+ACN+H" 0.5      
[3,] 371.3268 546.4621       0.02881342 -2.217735 "HMDB0006752" 371.3268 "M+ACN+H" 0.7071068
[4,] 179.0563 104.8357       0.02843861 -2.191182 "HMDB0000122" 179.0561 "M-H"     0.1825742
[5,] 179.0563 104.8357       0.02843861 -2.191182 "HMDB0003449" 179.0561 "M-H"     0.1825742
[6,] 380.2575 194.7167       0.02327888 -2.268827 "HMDB0001383" 380.2571 "M-H"     1        

NSCCN/bioconductor-metapone：用于代谢通路富集分析，识别生物样本中显著富集的代谢通路C21-steroid hormone biosynthesis and metabolism`
      m.z      retention.time p.value    statistic HMDB_ID       m.z      ion.type   counts   
 [1,] 380.2564 517.4015       0.02576156 2.263223  "HMDB0000253" 380.256  "M+ACN+Na" 0.2672612
 [2,] 380.2564 517.4015       0.02576156 2.263223  "HMDB0003069" 380.256  "M+ACN+Na" 0.2672612
 [3,] 482.3603 568.9371       0.04738957 -2.007298 "HMDB0006280" 482.3605 "M+ACN+Na" 0.2672612
 [4,] 482.3603 568.9371       0.04738957 -2.007298 "HMDB0006281" 482.3605 "M+ACN+Na" 0.2672612
 [5,] 482.3603 568.9371       0.04738957 -2.007298 "HMDB0006763" 482.3605 "M+ACN+Na" 0.2672612
 [6,] 465.2501 180.1967       0.01296878 2.484626  "HMDB0002829" 465.2494 "M-H"      0.5      
 [7,] 465.2501 180.1967       0.01296878 2.484626  "HMDB0004484" 465.2494 "M-H"      0.5      
 [8,] 465.2501 180.1967       0.01296878 2.484626  "HMDB0006203" 465.2494 "M-H"      0.5      
 [9,] 465.3047 178.2345       0.03315628 -2.130186 "HMDB0000653" 465.3044 "M-H"      1        
[10,] 349.1476 235.4682       0.02916411 2.181261  "HMDB0001032" 349.1474 "M-H2O-H"  0.5      
[11,] 349.1476 235.4682       0.02916411 2.181261  "HMDB0002833" 349.1474 "M-H2O-H"  0.5

(some rows omitted)