! NCBI is deprecating .SRA file links. This may result in an empty list with `--ncbi`.
+ Have a cool use case for ffq? Submit a PR to the `Use cases` section and we'll feature it!
Fetch metadata information from the following databases:
ffq receives an accession and returns the metadata for that accession as well as the metadata for all downstream accessions following the connections between GEO, SRA, EMBL-EBI, DDBJ, and Biosample. If you use ffq in a publication, please the cite*:
Gálvez-Merchán, Á., et al. (2022). Metadata retrieval from sequence databases with ffq. bioRxiv 2022.05.18.492548.
By default, ffq returns all downstream metadata down to the level of the SRR record. However, the desired level of resolution can be specified.
ffq can also skip returning the metadata, and instead return the raw data download links from any available host (FTP, AWS, GCP or NCBI) for GEO and SRA ids.
Installation
The latest release can be installed with
pip install ffq
The development version can be installed with
pip install git+https://github.com/pachterlab/ffq
Usage
Fetch information of an accession and display it in the terminal
ffq [accession]
where [accession] is either:
an SRA/EBI/DDJ accession
(SRR, SRX, SRS or SRP)
(ERR, ERX, ERS or ERP)
(DRR, DRS, DRX or DRP)
a GEO accession (GSE or GSM)
an ENCODE accession (ENCSR, ENCSB or ENCSD)
a Bioproject accession (CXR)
a Biosample accession (SAMN)
a DOI
Examples:
$ ffq SRR9990627
#=> Returns metadata for the SRR9990627 run.
$ ffq SRX7347523
#=> Returns metadata for the experiment SRX7347523 and for its associated SRR run.
$ ffq GSE129845
#=> Returns metadata for GSE129845 and for its 5 associated GSM, SRS, SRX and SRR ids.
$ ffq DRP004583
#=> Returns metadata for the study DRP004583 and its 104 associated DRS, DRX and SRR ids.
$ ffq ENCSR998WNE
#=> Returns metadata for the ENCODE experiment ENCSR998WNE.
Fetch information of multiple accessions and display it in the terminal
ffq [accession 1] [accession 2] ...
where [accession 1] and [accession 2] are accessions belonging to any of the above usage example categories.
Examples:
$ ffq SRR11181954 SRR11181954 SRR11181956
#=> Returns metadata for the three SRR runs.
$ ffq GSM4339769 GSM4339770 GSM4339771
#=> Returns metadata for the three GSM accessions, as well as for their corresponding downstream SRS, SRX and SRR accessions.
Fetch information of an accession only down to specified level
ffq -l [level] [accession]
where [level] is the number of downstream accessions you want to fetch
Examples:
$ ffq -l 1 GSM4339769
#=> Returns metadata only for GSM4339769, and not from any downstream accession.
$ ffq -l 3 GSE115469
#=> Returns metadata for GSE115469 and its downstream GSM and SRS accessions.
Fetch only raw data links from the host of your choice and display it in the terminal
FTP host
ffq --ftp [accession(s)]
where [accession(s)] is either a single accession or a space-delimited list of accessions.
where [JSON_PATH] is the path to the JSON file that will contain the information
and [accession(s)] is either a single accession or a space-delimited list of accessions.
Write accession information to multiple JSON files, one file per accession
ffq -o [OUT_DIR] --split [accessions]
where [OUT_DIR] is the path to directory to which to write the JSON files and [accessions] is a space-delimited list of accessions.
Information about each accession will be written to its own separate JSON file named [accession].json.
Fetch information of all studies (and all of their runs) in one or more papers
ffq [DOIS]
where [DOIS] is a space-delimited list of one or more DOIs. The output is a JSON-formatted string (or a JSON file if -o is provided) with SRA study accessions as keys. When --split is also provided, each study is written to its own separate JSON.
Complete output examples
Examples of complete outputs are available in the examples directory.
Downloading data
ffq is specifically designed to download metadata and to facilitate obtaining links to sequence files. To download raw data from the links obtained with ffq you can use one of the following:
ffq
Fetch metadata information from the following databases:
ffqreceives an accession and returns the metadata for that accession as well as the metadata for all downstream accessions following the connections between GEO, SRA, EMBL-EBI, DDBJ, and Biosample. If you useffqin a publication, please the cite*:The manuscript is available here: https://doi.org/10.1101/2022.05.18.492548.
By default, ffq returns all downstream metadata down to the level of the SRR record. However, the desired level of resolution can be specified.
ffqcan also skip returning the metadata, and instead return the raw data download links from any available host (FTP,AWS,GCPorNCBI) for GEO and SRA ids.Installation
The latest release can be installed with
The development version can be installed with
Usage
Fetch information of an accession and display it in the terminal
where
[accession]is either:SRR,SRX,SRSorSRP)ERR,ERX,ERSorERP)DRR,DRS,DRXorDRP)GSEorGSM)ENCSR,ENCSBorENCSD)CXR)SAMN)Examples:
Fetch information of multiple accessions and display it in the terminal
where
[accession 1]and[accession 2]are accessions belonging to any of the above usage example categories.Examples:
Fetch information of an accession only down to specified level
where
[level]is the number of downstream accessions you want to fetchExamples:
Fetch only raw data links from the host of your choice and display it in the terminal
FTP host
where
[accession(s)]is either a single accession or a space-delimited list of accessions.AWS host
GCP host
NCBI host
Examples:
Write accession information to a single JSON file
where
[JSON_PATH]is the path to the JSON file that will contain the information and[accession(s)]is either a single accession or a space-delimited list of accessions.Write accession information to multiple JSON files, one file per accession
where
[OUT_DIR]is the path to directory to which to write the JSON files and[accessions]is a space-delimited list of accessions. Information about each accession will be written to its own separate JSON file named[accession].json.Fetch information of all studies (and all of their runs) in one or more papers
where
[DOIS]is a space-delimited list of one or more DOIs. The output is a JSON-formatted string (or a JSON file if-ois provided) with SRA study accessions as keys. When--splitis also provided, each study is written to its own separate JSON.Complete output examples
Examples of complete outputs are available in the examples directory.
Downloading data
ffqis specifically designed to download metadata and to facilitate obtaining links to sequence files. To download raw data from the links obtained withffqyou can use one of the following:cURLandwgetfor FTP links,awsfor AWS links,gsutilfor GCP links,fasterq dumpfor converting SRA files to FASTQ files.FTP
By default,
cURLis installed on most computers and can be used to download files with FTP links. Alternatively,wgetcan be used.Alternatively, the
urls can be extracted from the json output withjqand then piped intocURL.If you don’t have
jqinstalled, you can use the default programgrep.