Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt’s Retrieve & ID Mapping RESTful APIs. Read the full documentation.
Programmatically retrieving any of the supported return and cross-reference fields from both UniProt-SwissProt and UniProt-TrEMBL (unreviewed) databases. For a full table containing all the supported resources, refer to the supported fields in the docs;
Querying UniProtKB entries using complex field-based queries with boolean operators ~ (NOT), | (OR), & (AND).
Further, for the cross-referenced fields that have has_full_version set to yes, returning the same field with extra information is supported by passing <field_name>_full, such as xref_pdb_full.
All available return fields are also accessible through the attribute ProtMapper.supported_return_fields:
UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the uniprotkb_fields module. This allows you to create sophisticated searches combining multiple criteria. For example:
from UniProtMapper import ProtKB
from UniProtMapper.uniprotkb_fields import (
organism_name,
length,
reviewed,
date_modified
)
# Find reviewed human proteins with length between 100-200 amino acids
# that were modified after January 1st, 2024
query = (
organism_name("human") &
reviewed(True) &
length(100, 200) &
date_modified("2024-01-01", "*")
)
protkb = ProtKB()
result = protkb.get(query)
For a list of all fields and their descriptions, check the API reference for the uniprotkb_fields module reference.
UniProtMapper provides a CLI for the ID Mapping class, ProtMapper, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown by protmap -h:
usage: UniProtMapper [-h] -i [IDS ...] [-r [RETURN_FIELDS ...]] [--default-fields] [-o OUTPUT]
[-from FROM_DB] [-to TO_DB] [-over] [-pf]
Retrieve data from UniProt using UniProt's RESTful API. For a list of all available fields, see: https://www.uniprot.org/help/return_fields
Alternatively, use the --print-fields argument to print the available fields and exit the program.
optional arguments:
-h, --help show this help message and exit
-i [IDS ...], --ids [IDS ...]
List of UniProt IDs to retrieve information from. Values must be
separated by spaces.
-r [RETURN_FIELDS ...], --return-fields [RETURN_FIELDS ...]
If not defined, will pass `None`, returning all available fields.
Else, values should be fields to be returned separated by spaces. See
--print-fields for available options.
--default-fields, -def
This option will override the --return-fields option. Returns only the
default fields stored in: <pkg_path>/resources/cli_return_fields.txt
-o OUTPUT, --output OUTPUT
Path to the output file to write the returned fields. If not provided,
will write to stdout.
-from FROM_DB, --from-db FROM_DB
The database from which the IDs are. For the available cross
references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
-to TO_DB, --to-db TO_DB
The database to which the IDs will be mapped. For the available cross
references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
-over, --overwrite If desired to overwrite an existing file when using -o/--output
-pf, --print-fields Prints the available return fields and exits the program.
Usage example, retrieving default fields from <pkg_path>/resources/cli_return_fields.txt:
👏🏼 Credits
UniProt for providing the API and the amazing database;
UniProtMapper
Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt’s Retrieve & ID Mapping RESTful APIs. Read the full documentation.
📚 Table of Contents
⛏️ Features
UniProtMapper is a tool for bioinformatics and proteomics research that supports:
~(NOT),|(OR),&(AND).For the first two functionalities, check the examples Mapping IDs and Retrieving Information below. The third, see Field-based Querying.
The ID mapping API can also be accessed through the CLI. For more information, check CLI.
📦 Installation
From PyPI (recommended):
Directly from GitHub:
From source:
🛠️ Usage
Mapping IDs
Use UniProtMapper to easily map between different protein identifiers:
The
resultis a pandas DataFrame containing the mapped IDs (see below), whilefailedis a list of identifiers that couldn’t be mapped.Retrieving Information
A DataFrame with the supported return fields is accessible through the attribute
ProtMapper.fields_table:From the DataFrame, all
return_fieldentries can be used to access UniProt data programmatically:Further, for the cross-referenced fields that have
has_full_versionset toyes, returning the same field with extra information is supported by passing<field_name>_full, such asxref_pdb_full.All available return fields are also accessible through the attribute
ProtMapper.supported_return_fields:Field-based Querying
UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the
uniprotkb_fieldsmodule. This allows you to create sophisticated searches combining multiple criteria. For example:For a list of all fields and their descriptions, check the API reference for the uniprotkb_fields module reference.
📖 Documentation
💻 Command Line Interface (CLI)
UniProtMapper provides a CLI for the ID Mapping class,
ProtMapper, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown byprotmap -h:Usage example, retrieving default fields from
<pkg_path>/resources/cli_return_fields.txt:👏🏼 Credits
For issues, feature requests, or questions, please open an issue on the GitHub repository.