Seqtk is a fast and lightweight tool for processing sequences in the FASTA or
FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be
optionally compressed by gzip. To install seqtk,
git clone https://github.com/lh3/seqtk.git;
cd seqtk; make
The only library dependency is zlib.
Seqtk Examples
Convert FASTQ to FASTA:
seqtk seq -a in.fq.gz > out.fa
Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to N (the 2nd):
Introduction
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip. To install
seqtk,The only library dependency is zlib.
Seqtk Examples
Convert FASTQ to FASTA:
Convert ILLUMINA 1.3+ FASTQ to FASTA and mask bases with quality lower than 20 to lowercases (the 1st command line) or to
N(the 2nd):Fold long FASTA/Q lines and remove FASTA/Q comments:
Convert multi-line FASTQ to 4-line FASTQ:
Reverse complement FASTA/Q:
Extract sequences with names in file
name.lst, one sequence name per line:Extract sequences in regions contained in file
reg.bed:Mask regions in
reg.bedto lowercases:Subsample 10000 read pairs from two large paired FASTQ files (remember to use the same random seed to keep pairing):
Trim low-quality bases from both ends using the Phred algorithm:
Trim 5bp from the left end of each read and 10bp from the right end:
Find telomere (TTAGGG)n repeats: