HTSeqQC is an automated quality control analysis tool for a single and
paired-end high-throughput sequencing data (HTS) generated from Illumina
sequencing platforms.
Features
Simultaneously filter and/or trim reads for adapter or primer
contamination, uncalled bases (N), and low-quality reads
Supports single and paired-end reads
Analyze multiple samples simultaneously
Parallel computation for accelerating the speed of analysis
Visualization and statistics
Docker image is available
Available on CyVerse Discovery Environment (DE)
No dependency on an external open-source tool
Getting Started
Prerequisites
You need Python 3 (tested on 3.6 and 3.7) to install and run HTSeqQC. Following Python 3
packages need to install before running the HTSeqQC. If you have not .
installed these packages, HTSeqQC will guide you to install them.
numpy
pysam
matplotlib
termcolor
datetime
Installing
Clone or download HTSeqQC using following command,
To install HTSeqQC, run following command in the root folder,
python setup.py install
How to use
Print help message to see all required and optional parameters,
filter.py -h
usage: filter.py [-h] [-a INPUT_FILES_1] [-b INPUT_FILES_2] [-c QUAL_FMT]
[-e N_CONT] [-f ADPT_SEQS] [-d MIN_SIZE] [-g ADPT_MATCH]
[-i QUAL_THRESH] [-n TRIM_OPT] [-p WIND_SIZE]
[-r MIN_LEN_FILT] [-q CPU] [-m OUT_FMT] [-v VIS_OPT]
[--version]
Quality control analysis of single and paired-end sequence data
optional arguments:
-h, --help show this help message and exit
-a INPUT_FILES_1, --p1 INPUT_FILES_1
Single end input files or left files for paired-end
data (.fastq, .fq). Multiple sample files must be
separated by comma or space
-b INPUT_FILES_2, --p2 INPUT_FILES_2
Right files for paired-end data (.fastq, .fq).
Multiple files must be separated by comma or space
-c QUAL_FMT, --qfmt QUAL_FMT
Quality value format [1= Illumina 1.8, 2= Illumina
1.3,3= Sanger]. If quality format not provided, it
will automatically detect based on sequence data
-e N_CONT, --nb N_CONT
Filter the reads containing given % of uncalled bases
(N)
-f ADPT_SEQS, --adp ADPT_SEQS
Trim the adapter and truncate the read sequence
(multiple adapter sequences must be separated by
comma)
-d MIN_SIZE, --msz MIN_SIZE
Filter the reads which are lesser than minimum size
-g ADPT_MATCH, --per ADPT_MATCH
Truncate the read sequence if it matches to adapter
sequence equal or more than given percent (0.0-1.0)
[default=0.9]
-i QUAL_THRESH, --qthr QUAL_THRESH
Filter the read sequence if average quality of bases
in reads is lower than threshold (1-40) [default:20]
-n TRIM_OPT, --trim TRIM_OPT
If trim option set to True, the reads with low quality
(as defined by option --qthr) will be trimmed instead
of discarding [True|False] [default: False]
-p WIND_SIZE, --wsz WIND_SIZE
The window size for trimming (5->3) the reads. This
option should always set when -trim option is defined
[default: 5]
-r MIN_LEN_FILT, --mlk MIN_LEN_FILT
Minimum length of the reads to retain after trimming
-q CPU, --cpu CPU Number of CPU [default:2]
-m OUT_FMT, --ofmt OUT_FMT
Output file format (fastq/fasta) [default:fastq]
-v VIS_OPT, --no-vis VIS_OPT
No figures will be produced [True|False]
[default:False]
--version show program's version number and exit
Run For single-end reads
# for single sample
filter.py OPTIONS -a fastq_file
# for multiple samples
filter.py OPTIONS -a fastq_file_1,fastq_file_2
Filter paired-end reads
# for single sample
filter.py OPTIONS -a fastq_file_left -b fastq_file_right
# for multiple samples
filter.py OPTIONS -a fastq_file_left_1,fastq_file_left_2 -b fastq_file_right_1,fastq_file_right_2
Output
HTSeqQC produces the filtered cleaned HTS data as FASTQ/FASTA files,
and statistics and visualization of filtered cleaned HTS datasets. The
output will be saved in folder with name ending as filtering_out.
License
This project is available under the MIT License. See complete details in LICENSE file.
HTSeqQC Analysis commands used for test datasets
Download the test paired and single end data using NCBI SRA toolkit
HTSeqQC
HTSeqQC is an automated quality control analysis tool for a single and paired-end high-throughput sequencing data (HTS) generated from Illumina sequencing platforms.
Features
Getting Started
Prerequisites
You need Python 3 (tested on 3.6 and 3.7) to install and run HTSeqQC. Following Python 3 packages need to install before running the HTSeqQC. If you have not . installed these packages, HTSeqQC will guide you to install them.
Installing
Clone or download HTSeqQC using following command,
To install HTSeqQC, run following command in the root folder,
How to use
Print help message to see all required and optional parameters,
Run For single-end reads
Filter paired-end reads
Output
HTSeqQC produces the filtered cleaned HTS data as FASTQ/FASTA files, and statistics and visualization of filtered cleaned HTS datasets. The output will be saved in folder with name ending as filtering_out.
License
This project is available under the MIT License. See complete details in LICENSE file.
HTSeqQC Analysis commands used for test datasets
Download the test paired and single end data using NCBI SRA toolkit
Run HTSeqQC as a command line tool (Linux and Mac)
filter.py --cpu 18 --p1 SRR2165176_1.fastq --p2 SRR2165176_2.fastqfilter.py --cpu 18 --qthr 25 --nb 5 --adp AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT --p1 SRR2165176_1.fastq --p2 SRR2165176_2.fastqfilter.py --cpu 18 --p1 SRR2165176_1.fastq,SRR2165177_1.fastq,SRR2165178_1.fastq --p2 SRR2165176_2.fastq,SRR2165177_2.fastq,SRR2165178_2.fastqfilter.py --cpu 18 --p1 SRR1805340.fastq