Processed high-quality harmonized TCGA data of five cancer types
If you use the data from this package in published research, please cite:
Tianle Ma, Aidong Zhang,
Integrate Multi-omic Data Using Affinity Network Fusion (ANF) for Cancer Patient Clustering,
https://arxiv.org/abs/1708.07136
This package contains three R objects: Wall, project_ids and surv.plot:
Wall contains lists inside list. In fact, Wall a list (five cancer type) of list (six feature normalization types: raw.all, raw.sel, log.all, log.sel, vst.sel, normalized) of list (three feature spaces or views: fpkm, mirna, and methy450) of matrices. The rownames of each matrix is the submitter_id (can be seen as a patient id), and the column names of each matrix is the aliquot ID (which contains the submitter_id as prefix). Based on these aliquot ID, users can download original data from https://portal.gdc.cancer.gov/repository .
project_ids is a named character vector, that maps the submitter_id (represent a patient) to project_id (one-to-one correspond to disease type). This is used for evaluating clustering results, such as calculating NMI and Adjusted Rand Index (ARI).
surv.plot is a data.frame containing patient survival data for survival analysis, providing an “indirect” way to evaluate clustering results.
Processed high-quality harmonized TCGA data of five cancer types
If you use the data from this package in published research, please cite:
This package contains three R objects:
Wall,project_idsandsurv.plot:Wallcontains lists inside list. In fact,Walla list (five cancer type) of list (six feature normalization types:raw.all,raw.sel,log.all,log.sel,vst.sel,normalized) of list (three feature spaces or views:fpkm,mirna, andmethy450) of matrices. The rownames of each matrix is the submitter_id (can be seen as a patient id), and the column names of each matrix is the aliquot ID (which contains the submitter_id as prefix). Based on these aliquot ID, users can download original data from https://portal.gdc.cancer.gov/repository .project_idsis a named character vector, that maps the submitter_id (represent a patient) to project_id (one-to-one correspond to disease type). This is used for evaluating clustering results, such as calculating NMI and Adjusted Rand Index (ARI).surv.plotis a data.frame containing patient survival data for survival analysis, providing an “indirect” way to evaluate clustering results.See paper https://arxiv.org/abs/1708.07136 for more explanation.