目录

Processed high-quality harmonized TCGA data of five cancer types

If you use the data from this package in published research, please cite:

Tianle Ma, Aidong Zhang, Integrate Multi-omic Data Using Affinity Network Fusion (ANF) for Cancer Patient Clustering, https://arxiv.org/abs/1708.07136

This package contains three R objects: Wall, project_ids and surv.plot:

Wall contains lists inside list. In fact, Wall a list (five cancer type) of list (six feature normalization types: raw.all, raw.sel, log.all, log.sel, vst.sel, normalized) of list (three feature spaces or views: fpkm, mirna, and methy450) of matrices. The rownames of each matrix is the submitter_id (can be seen as a patient id), and the column names of each matrix is the aliquot ID (which contains the submitter_id as prefix). Based on these aliquot ID, users can download original data from https://portal.gdc.cancer.gov/repository .

project_ids is a named character vector, that maps the submitter_id (represent a patient) to project_id (one-to-one correspond to disease type). This is used for evaluating clustering results, such as calculating NMI and Adjusted Rand Index (ARI).

surv.plot is a data.frame containing patient survival data for survival analysis, providing an “indirect” way to evaluate clustering results.

See paper https://arxiv.org/abs/1708.07136 for more explanation.

关于

提供TCGA数据的标准化和整合版本,便于生物医学研究中的数据分析

762.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号