‘HuBMAP’ data portal (https://portal.hubmapconsortium.org/) provides an open,
global bio-molecular atlas of the human body at the cellular level. HuBMAPR
package provides an alternative interface to explore the data via R.
The HuBMAP Consortium offers several
APIs.
To achieve the main objectives, HuBMAPR package specifically integrates three
APIs:
Entity API is specifically utilized in the bulk_data_transfer()
function for Globus URL retrieval
Ontology API:
The Ontology API is applied in the organ() function to provide additional
information about the abbreviation and corresponding full name of each organ.
Each API serves a distinct purpose with unique query capabilities, tailored to
meet various needs. Utilizing the httr2 and rjsoncons packages, HuBMAPR
effectively manages, modifies, and executes multiple requests via these APIs,
presenting responses in formats such as tibble or character. These outputs are
further modified for clarity in the final results from the HuBMAPR functions,
and these functions help reflect the data information of HuBMAP Data Portal as
much as possible.
HuBMAP Data incorporates three different
identifiers:
HuBMAP ID, e.g. HBM399.VCTL.353
Universally Unique Identifier (UUID), e.g. 7036a70229eff1a51af965454dddbe7d
Digital Object Identifiers (DOI), e.g. 10.35079/HBM399.VCTL.353.
The HuBMAPR
package utilizes the UUID - a 32-digit hexadecimal number - and the more
human-readable HuBMAP ID as two common identifiers in the retrieved results.
Considering precision and compatibility with software implementation and data
storage, UUID serves as the primary identifier to retrieve data across various
functions, with the UUID mapping uniquely to its corresponding HuBMAP ID.
The
systematic nomenclature is adopted for functions in the package by appending
the entity category prefix to the concise description of the specific
functionality. Most of the functions are grouped by entity categories, thereby
simplifying the process of selecting the appropriate functions to retrieve the
desired information associated with the given UUID from the specific entity
category. The structure of these functions is heavily consistent across all
entity categories with some exceptions for collection and publication.
Installation
HuBMAPR is a R package available in Bioconductor version ≥ 3.20 and
R version ≥ 4.4.0. You can install HuBMAPR by using the following
commands in R session from Bioconductor:
HuBMAPR
‘HuBMAP’ data portal (https://portal.hubmapconsortium.org/) provides an open, global bio-molecular atlas of the human body at the cellular level.
HuBMAPRpackage provides an alternative interface to explore the data via R.The HuBMAP Consortium offers several APIs. To achieve the main objectives,
HuBMAPRpackage specifically integrates three APIs:Search API is primarily searching relevant data information and is referenced to the Elasticsearch API.
Entity API is specifically utilized in the
bulk_data_transfer()function for Globus URL retrievalorgan()function to provide additional information about the abbreviation and corresponding full name of each organ.Each API serves a distinct purpose with unique query capabilities, tailored to meet various needs. Utilizing the
httr2andrjsonconspackages,HuBMAPReffectively manages, modifies, and executes multiple requests via these APIs, presenting responses in formats such as tibble or character. These outputs are further modified for clarity in the final results from theHuBMAPRfunctions, and these functions help reflect the data information of HuBMAP Data Portal as much as possible.HuBMAP Data incorporates three different identifiers:
HuBMAP ID, e.g. HBM399.VCTL.353
Universally Unique Identifier (UUID), e.g. 7036a70229eff1a51af965454dddbe7d
Digital Object Identifiers (DOI), e.g. 10.35079/HBM399.VCTL.353.
The
HuBMAPRpackage utilizes the UUID - a 32-digit hexadecimal number - and the more human-readable HuBMAP ID as two common identifiers in the retrieved results. Considering precision and compatibility with software implementation and data storage, UUID serves as the primary identifier to retrieve data across various functions, with the UUID mapping uniquely to its corresponding HuBMAP ID.The systematic nomenclature is adopted for functions in the package by appending the entity category prefix to the concise description of the specific functionality. Most of the functions are grouped by entity categories, thereby simplifying the process of selecting the appropriate functions to retrieve the desired information associated with the given UUID from the specific entity category. The structure of these functions is heavily consistent across all entity categories with some exceptions for collection and publication.
Installation
HuBMAPRis a R package available in Bioconductor version ≥ 3.20 and R version ≥ 4.4.0. You can installHuBMAPRby using the following commands in R session from Bioconductor:Additionally, you can install development version from GitHub:
Use
Entity Category:
Dataset
Sample
Donor
Collection
Publication
Available records for [Entity Category]:
datasets()
samples()
donors()
collections()
publications()
The default columns from [Entity Category]()]:
datasets_default_columns(as = c(“tibble”, “character”))
samples _default_columns(as = c(“tibble”, “character”)
donors _default_columns(as = c(“tibble”, “character”)
collections _default_columns(as = c(“tibble”, “character”)
publications _default_columns(as = c(“tibble”, “character”)
Single Record Information for [Entity Category] record:
Dataset
dataset_detail(dataset_uuid)
dataset_derived(dataset_uuid)
dataset_metadata(dataset_uuid)
dataset_contributors(dataset_uuid)
Sample
sample_detail(sample_uuid)
sample_derived(sample_uuid, entity_type = c(“Dataset”, “Sample”))
sample_metadata(sample_uuid)
Donor
donor_detail(donor_uuid)
donor _derived(donor_uuid, entity_type = c(“Dataset”, “Sample”))
donor _metadata(donor_uuid)
Collection
collection_detail(collection_uuid)
collection_data(collection_uuid)
collection_contributors(collection_uuid)
collection_contacts(collection_uuid)
collection_information(collection_uuid)
Publication
publication_detail(publication_uuid)
publication _data(publication_uuid)
publication_authors(publication_uuid)
publication_information(publication_uuid)
Provenance of a dataset/sample/donor:
Additional information about organ abbreviation and its full name:
Retrieve data files from one dataset single record:
View the article Explore Human BioMelecular Atlas Program Data Portal to read detailed examples.
Preprint
Pre-print: https://www.biorxiv.org/content/10.1101/2024.09.26.615227v1