srahunter is a tool designed to facilitate the downloading and processing of data and metadata from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). This package includes three modules : a module for automatized download of fastq files from SRA (srahunter download), a module for main SRA associated metadata retrieval (srahunter metadata), and a module to retrieve the full associated metadata to an accession number (srahunter fullmetadata).
Installation
As part of the bioconda repository to install srahunter you can simply use this command
I suggest to use mamba to speed-up the installation process
mamba install bioconda::srahunter
or as an alternative
conda install bioconda::srahunter
Scripts
srahunter download:
Using an SRA accession list downloaded by the user from SRA as input the tool perform the download of the SRA files and the subsequent conversion to single or paired FASTQ files.
This script has been tested for the main sequencing platforms so can be used to download data produced with Illumina, PACBio and ONT platforms.
Automatic removal of .sra files after successfull dumping, the user don’t need to do it manually
Check disk space at the beginning of every sample download (at least 20G of disk required). If the disk is almost full the script will stop with an error message
Remember of the already successfull processed data and, in case of interruption, the script will resume
Writing of the failed downloads in a file (failed_list.csv)
Options:
-h
Show help message and exit
--list , -i
Accession list from SRA (relative or full file path)
-t
Number of threads (default: 6)
--path,-sra-path,-p
Path to where to download .sra files (default: currentdirectory/tmp_srahunter
--maxsize,-ms
Max size of each sra file (default: 50G)
--outdir,-o
Path to where to download .fastq files (default: currentdirectory)
Attention!! For the moment only accession Run numbers are supported (e.g. SRR8487013) and must be included in an accession list
srahunter metadata:
This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final table ‘SRA_info.csv’. The module will also produce an interactive table in the folder SRA_html. In this case the module will download the most important metadata associated to a Run accession number.
Usage Example:
srahunter metadata -i <accession_list.txt>
Main functionalities:
Fast data retrieval with Entrez-direct
Metadata collection in a clean CSV format
HTML interactive table with links to SRA, a chart summarising the data, and the possibility to apply filters
Options:
-h
Show help message and exit
-i
Accession list from SRA (relative or full file path)
srahunter fullmetadata BETA:
BETA VERSION !Not working for miRNA-seq data. This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final full table ‘Full_SRA_info.csv’. In this case the module will download all the metadata associated to a Run accession number.
Usage Example:
srahunter fullmetadata -i <accession_list.txt>
Main functionalities:
Fast data retrieval with Entrez-direct
Metadata collection in a clean CSV format
Options:
-h
Show help message and exit
-i
Accession list from SRA (relative or full file path)
Error Handling and Troubleshooting
If you encounter any issues or errors while using sr, please check the following common problems:
Ensure that your Conda or Mamba environment is correctly set up.
Verify that the format of your SRA accession list is correct.
Check available disk space if you encounter download issues.
srahunter
Description
srahunter is a tool designed to facilitate the downloading and processing of data and metadata from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI). This package includes three modules : a module for automatized download of fastq files from SRA (srahunter download), a module for main SRA associated metadata retrieval (srahunter metadata), and a module to retrieve the full associated metadata to an accession number (srahunter fullmetadata).
Installation
As part of the bioconda repository to install srahunter you can simply use this command
I suggest to use mamba to speed-up the installation process
or as an alternative
Scripts
srahunter download:Using an SRA accession list downloaded by the user from SRA as input the tool perform the download of the SRA files and the subsequent conversion to single or paired FASTQ files.
This script has been tested for the main sequencing platforms so can be used to download data produced with Illumina, PACBio and ONT platforms.
Usage Example:
Main functionality:
Options:
Attention!! For the moment only accession Run numbers are supported (e.g. SRR8487013) and must be included in an accession list
srahunter metadata:This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final table ‘SRA_info.csv’. The module will also produce an interactive table in the folder SRA_html. In this case the module will download the most important metadata associated to a Run accession number.
Usage Example:
Main functionalities:
Options:
srahunter fullmetadataBETA:BETA VERSION !Not working for miRNA-seq data. This module handles the retrieval of metadata from the NCBI SRA database, splits large input files into manageable chunks, and organizes the fetched data in a final full table ‘Full_SRA_info.csv’. In this case the module will download all the metadata associated to a Run accession number.
Usage Example:
Main functionalities:
Options:
Error Handling and Troubleshooting
If you encounter any issues or errors while using sr, please check the following common problems:
For more help, please open an issue on the GitHub repository.
Contributing
Contributions to srahunter are welcome! Please read our contributing guidelines on the GitHub repository for instructions on how to contribute.
License
srahunter is released under the MIT License.
Acknowledgments
Special thanks to @IlFog and Dr Gaetan Thilliez for tool beta testing