Bulked segregant analysis, as implemented in QTL-seq (Takagi et al., 2013), is a powerful and efficient method for identifying agronomically important loci in crop plants. QTL-seq has been adapted from MutMap to identify quantitative trait loci. It utilizes sequences pooled from two segregating progeny populations with extreme opposite traits (e.g. resistant vs susceptible) and a single whole-genome resequencing of either of the parental cultivars. The updated pipeline is approximately 5-8 times faster than the previous pipeline, is easier for novice users to use, and can be easily installed through bioconda with all dependencies.
Citation
Yu Sugihara, Lester Young, Hiroki Yaegashi, Satoshi Natsume, Daniel J. Shea, Hiroki Takagi, Helen Booker, Hideki Innan, Ryohei Terauchi, Akira Abe (2022). High performance pipeline for MutMap and QTL-seq. PeerJ, 10:e13170.
If your reference genome contains more than 50 contigs, only the 50 longest contigs will be plotted.
qtlseq -h
usage: qtlseq -r <FASTA> -p <BAM|FASTQ> -b1 <BAM|FASTQ>
-b2 <BAM|FASTQ> -n1 <INT> -n2 <INT> -o <OUT_DIR>
[-F <INT>] [-T] [-e <DATABASE>]
QTL-seq version 2.3.0
options:
-h, --help show this help message and exit
-r , --ref Reference FASTA file.
-p , --parent FASTQ or BAM file of the parent. If specifying
FASTQ, separate paired-end files with a comma,
e.g., -p fastq1,fastq2. This option can be
used multiple times.
-b1 , --bulk1 FASTQ or BAM file of bulk 1. If specifying
FASTQ, separate paired-end files with a comma,
e.g., -b1 fastq1,fastq2. This option can be
used multiple times.
-b2 , --bulk2 FASTQ or BAM file of bulk 2. If specifying
FASTQ, separate paired-end files with a comma,
e.g., -b2 fastq1,fastq2. This option can be
used multiple times.
-n1 , --N-bulk1 Number of individuals in bulk 1.
-n2 , --N-bulk2 Number of individuals in bulk 2.
-o , --out Output directory. The specified directory must not
already exist.
-F , --filial Filial generation. This parameter must be greater
than 1. [2]
-t , --threads Number of threads. If a value less than 1 is specified,
QTL-seq will use the maximum available threads. [2]
-w , --window Window size in kilobases (kb). [2000]
-s , --step Step size in kilobases (kb). [100]
-D , --max-depth Maximum depth of variants to be used. [250]
-d , --min-depth Minimum depth of variants to be used. [8]
-N , --N-rep Number of replicates for simulations to generate
null distribution. [5000]
-T, --trim Trim FASTQ files using Trimmomatic.
-a , --adapter FASTA file containing adapter sequences. This option
is used when "-T" is specified for trimming.
--trim-params Parameters for Trimmomatic. Input parameters
must be comma-separated in the following order:
Phred score, ILLUMINACLIP, LEADING, TRAILING,
SLIDINGWINDOW, MINLEN. To remove Illumina adapters,
specify the adapter FASTA file with "--adapter".
If not specified, adapter trimming will be skipped.
[33,<ADAPTER_FASTA>:2:30:10,20,20,4:15,75]
-e , --snpEff Predict causal variants using SnpEff. Check
available databases in SnpEff.
--line-colors Colors for threshold lines in plots. Specify a
comma-separated list in the order of SNP-index,
p95, and p99. ["#C3310F,#009E72,#FDB003"]
--dot-colors Colors for dots in plots. Specify a
comma-separated list in the order of bulk1,
bulk2, and delta. ["#74D3AE,#FFBE0B,#B3B8DD"]
--mem Maximum memory per thread when sorting BAM files;
suffixes K/M/G are recognized. [1G]
-q , --min-MQ Minimum mapping quality for mpileup. [40]
-Q , --min-BQ Minimum base quality for mpileup. [18]
-C , --adjust-MQ Adjust the mapping quality for mpileup. The default
setting is optimized for BWA. [50]
-v, --version show program's version number and exit
QTL-seq can be run from FASTQ (without or with trimming) and BAM. If you want to run QTL-seq from VCF, please use QTL-plot (example 5). Once you run QTL-seq, QTL-seq automatically completes the subprocesses.
-p : FASTQs of parent. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.
-b1 : FASTQs of bulk 1. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.
-b2 : FASTQs of bulk 2. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.
-n1 : number of individuals in bulk 1.
-n2 : number of individuals in bulk 2.
-o : name of output directory. The specified directory should not already exist.
-T : trim your reads using trimmomatic.
-a : FASTA of adapter sequences for trimmomatic.
If you are using TrueSeq3, you can find the adapter sequences in the Github page of Trimmomatic. This thread is also helpful to preprare the adapter file.
QTL-seq automatically merges multiple FASTQ and BAM files. Of course, you can merge FASTQ or BAM files using cat or samtools merge before inputting them into QTL-seq. If you specify -p multiple times, please make sure that those files include only one individual. On the other hand, -b1 and -b2 can include more than one individuals because those are bulked samples. QTL-seq automatically classifies FASTQ and BAM files based on whether comma exits or not.
Example 5 : run QTL-plot from VCF
usage: qtlplot -v <VCF> -n1 <INT> -n2 <INT> -o <OUT_DIR>
[-F <INT>] [-t <INT>] [-w <INT>] [-s <INT>] [-D <INT>]
[-d <INT>] [-N <INT>] [-m <FLOAT>] [-S <INT>] [-e <DATABASE>]
[--igv] [--indel]
QTL-plot version 2.3.0
options:
-h, --help show this help message and exit
-v , --vcf VCF file which contains parent, bulk1, and bulk2
in this order. This VCF file must have AD field.
-n1 , --N-bulk1 Number of individuals in bulk 1.
-n2 , --N-bulk2 Number of individuals in bulk 2.
-o , --out Output directory. The specified directory can already
exist.
-F , --filial Filial generation. This parameter must be
greater than 1. [2]
-t , --threads Number of threads. If a value less than 1 is specified,
QTL-seq will use the maximum available threads. [2]
-w , --window Window size (kb). [2000]
-s , --step Step size (kb). [100]
-D , --max-depth Maximum depth of variants to be used. [250]
-d , --min-depth Minimum depth of variants to be used. [8]
-N , --N-rep Number of replicates for simulations to generate
null distribution. [5000]
-m , --min-SNPindex Cutoff of minimum SNP-index for clear results. [0.3]
-S , --strand-bias Filter out spurious homozygous genotypes in the cultivar
based on strand bias. If ADF (or ADR) is higher than
this cutoff when ADR (or ADF) is 0, that SNP will be
filtered out. If you want to disable this filtering,
set this cutoff to 0. [7]
-e , --snpEff Predict causal variants using SnpEff. Check
available databases in SnpEff.
--igv Output IGV format file to check results on IGV.
--indel Plot SNP-index with INDEL.
--line-colors Colors for threshold lines in plots. Specify a
comma-separated list in the order of SNP-index,
p95, and p99. ["#C3310F,#009E72,#FDB003"]
--dot-colors Colors for dots in plots. Specify a
comma-separated list in the order of bulk1,
bulk2, and delta. ["#74D3AE,#FFBE0B,#B3B8DD"]
--fig-width Width allocated in chromosome figure. [7.5]
--fig-height Height allocated in chromosome figure. [4.0]
--white-space White space between figures. (This option
only affects vertical direction.) [0.6]
-f , --format Specify the format of an output image.
eps/jpeg/jpg/pdf/pgf/png/rgba/svg/svgz/tif/tiff
--version show program's version number and exit
QTL-plot is included in QTL-seq. QTL-seq run QTL-plot after making VCF. Then, QTL-plot will work with default parameters. If you want to change some parameters, you can use VCF inside of (OUT_DIR/30_vcf/qtlseq.vcf.gz) to retry plotting process like below.
DELTA SNP-index : real delta SNP-index (bulk2 - bulk1)
sliding_window.tsv : columns in this order.
CHROM : chromosome name
POSI : central position of window
MEAN p99 : mean of p99 (absolute value)
MEAN p95 : mean of p95 (absolute value)
MEAN SNP-index 1 : mean SNP-index of bulk 1
MEAN SNP-index 2 : mean SNP-index of bulk 2
MEAN DELTA SNP-index : mean delta SNP-index
snp_index.p95.tsv and snp_index.p99.tsv contain only the SNPs that exceed the respective thresholds (95% or 99%). Similarly, sliding_window.p95.tsv and sliding_window.p99.tsv contain only the windows that exceed the respective thresholds.
delta_SNPindex.png : resulting plot (like below)
BLUE dot : variant
RED line : mean SNP-index
ORANGE line : mean p99
GREEN line : mean p95
If you run QTL-seq with SnpEff, the following additional outputs will be generated:
qtlseq.snpEff.vcf: The updated VCF file after annotation by SnpEff, located in the 40_qtlseq directory.
snp_index.tsv: This file will contain a new column, impact, which includes the mutation impact information predicted by SnpEff.
When plotting the results, variants classified as MODERATE by SnpEff are marked with a + symbol, while variants classified as HIGH are marked with an x symbol in the plot.
Additional Resources
QTL-seq commands breakdown
For a detailed breakdown of the commands used in QTL-seq, including explanations of each step, parameters, and best practices for troubleshooting, please refer to the QTL-seq Commands Breakdown document.
Build a custom SnpEff database
If you are working with a non-model organism or your own reference genome, you may need to build a custom SnpEff database. For detailed instructions on how to build a custom SnpEff database, please refer to the Build a Custom SnpEff Database document.
Frequently Asked Questions
Choosing the reference genome
You can use a line that was not involved in the cross as the reference genome. In the version of QTL-seq published by Takagi et al., 2013, the reference fasta was rebuilt using one of the parents’ reads to create a consensus sequence. However, starting from version 2, that step has been omitted. Instead, the VCF file is used to determine which parent contributed each segregating mutation found in the progeny.
When there are many contigs
The current setting has been updated to pick the top 50 contigs based on length, so only these contigs will be displayed in the plot. However, the table contains SNP-index information for all contigs, allowing you to confirm significant SNPs even for contigs not shown in the plot. You can also generate plots for these SNPs independently if needed. Since contigs smaller than the sliding window size often produce less reliable results, excluding them from the analysis should not be an issue.
How to filter for higher-confidence SNPs
If the phenotype is clear and the sequence data is clean, the default settings should already show some results. If you want to focus on higher-confidence SNPs, you can sequence both parents and retain only the SNPs that are clearly 0/0 in one parent and 1/1 in the other in the VCF file, which should produce cleaner results. However, keep in mind that the current version of QTL-seq only allows input for one parent’s sequence. The linked page explains default QTL-seq commands, which involve only one parent in the workflow, but it may still be helpful as a reference when creating VCFs that include both parents.
About multiple testing correction
This function has been deprecated since v2.2.5. We highly recommend running QTL-seq without this function. However, if you would like to use this function, you can use it with versions of QTL-seq older than v2.2.5.
QTL-seq User Guide
version 2.3.0
Table of contents
What is QTL-seq?
Bulked segregant analysis, as implemented in QTL-seq (Takagi et al., 2013), is a powerful and efficient method for identifying agronomically important loci in crop plants. QTL-seq has been adapted from MutMap to identify quantitative trait loci. It utilizes sequences pooled from two segregating progeny populations with extreme opposite traits (e.g. resistant vs susceptible) and a single whole-genome resequencing of either of the parental cultivars. The updated pipeline is approximately 5-8 times faster than the previous pipeline, is easier for novice users to use, and can be easily installed through bioconda with all dependencies.
Citation
Yu Sugihara, Lester Young, Hiroki Yaegashi, Satoshi Natsume, Daniel J. Shea, Hiroki Takagi, Helen Booker, Hideki Innan, Ryohei Terauchi, Akira Abe (2022). High performance pipeline for MutMap and QTL-seq. PeerJ, 10:e13170.
Hiroki Takagi, Akira Abe, Kentaro Yoshida, Shunichi Kosugi, Satoshi Natsume, Chikako Mitsuoka, Aiko Uemura, Hiroe Utsushi, Muluneh Tamiru, Shohei Takuno, Hideki Innan, Liliana M. Cano, Sophien Kamoun, Ryohei Terauchi (2013). QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant journal 74:174-183.
Installation
Dependencies
Software
Python (>=3.8) libraries
Installation via bioconda
You can install QTL-seq via bioconda.
Installation via environment.yml
If you install from source and want all dependencies in one environment, you can use:
Usage
If your reference genome contains more than 50 contigs, only the 50 longest contigs will be plotted.
QTL-seq can be run from FASTQ (without or with trimming) and BAM. If you want to run QTL-seq from VCF, please use QTL-plot (example 5). Once you run QTL-seq, QTL-seq automatically completes the subprocesses.
Example 1 : run QTL-seq from FASTQ without trimming
-r: reference fasta-p: FASTQs of parent. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-b1: FASTQs of bulk 1. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-b2: FASTQs of bulk 2. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-n1: number of individuals in bulk 1.-n2: number of individuals in bulk 2.-o: name of output directory. The specified directory should not already exist.Example 2 : run QTL-seq from FASTQ with trimming
-r: reference fasta-p: FASTQs of parent. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-b1: FASTQs of bulk 1. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-b2: FASTQs of bulk 2. Please input paired-end reads separated by a comma. FASTQ files can be gzipped.-n1: number of individuals in bulk 1.-n2: number of individuals in bulk 2.-o: name of output directory. The specified directory should not already exist.-T: trim your reads using trimmomatic.-a: FASTA of adapter sequences for trimmomatic.If you are using TrueSeq3, you can find the adapter sequences in the Github page of Trimmomatic. This thread is also helpful to preprare the adapter file.
Example 3 : run QTL-seq from BAM
-r: reference fasta-p: BAM of parent.-b1: BAM of bulk 1.-b2: BAM of bulk 2.-n1: number of individuals in bulk 1.-n2: number of individuals in bulk 2.-o: name of output directory. The specified directory should not already exist.Example 4 : run QTL-seq from multiple FASTQs and BAMs
QTL-seq automatically merges multiple FASTQ and BAM files. Of course, you can merge FASTQ or BAM files using
catorsamtools mergebefore inputting them into QTL-seq. If you specify-pmultiple times, please make sure that those files include only one individual. On the other hand,-b1and-b2can include more than one individuals because those are bulked samples. QTL-seq automatically classifies FASTQ and BAM files based on whether comma exits or not.Example 5 : run QTL-plot from VCF
QTL-plot is included in QTL-seq. QTL-seq run QTL-plot after making VCF. Then, QTL-plot will work with default parameters. If you want to change some parameters, you can use VCF inside of
(OUT_DIR/30_vcf/qtlseq.vcf.gz)to retry plotting process like below.Using QTL-plot for a VCF which was made by yourself
In this case:
If you got an error, please try to run QTL-seq from FASTQ or BAM before asking in issues.
Outputs
Inside of
OUT_DIRis like below.00_fastqwhich includes FASTQs after trimming.40_qtlseq.snp_index.tsv: columns in this order.sliding_window.tsv: columns in this order.snp_index.p95.tsvandsnp_index.p99.tsvcontain only the SNPs that exceed the respective thresholds (95% or 99%). Similarly,sliding_window.p95.tsvandsliding_window.p99.tsvcontain only the windows that exceed the respective thresholds.delta_SNPindex.png: resulting plot (like below)40_qtlseqdirectory.+symbol, while variants classified as HIGH are marked with anxsymbol in the plot.Additional Resources
QTL-seq commands breakdown
For a detailed breakdown of the commands used in QTL-seq, including explanations of each step, parameters, and best practices for troubleshooting, please refer to the QTL-seq Commands Breakdown document.
Build a custom SnpEff database
If you are working with a non-model organism or your own reference genome, you may need to build a custom SnpEff database. For detailed instructions on how to build a custom SnpEff database, please refer to the Build a Custom SnpEff Database document.
Frequently Asked Questions
Choosing the reference genome
You can use a line that was not involved in the cross as the reference genome. In the version of QTL-seq published by Takagi et al., 2013, the reference fasta was rebuilt using one of the parents’ reads to create a consensus sequence. However, starting from version 2, that step has been omitted. Instead, the VCF file is used to determine which parent contributed each segregating mutation found in the progeny.
When there are many contigs
The current setting has been updated to pick the top 50 contigs based on length, so only these contigs will be displayed in the plot. However, the table contains SNP-index information for all contigs, allowing you to confirm significant SNPs even for contigs not shown in the plot. You can also generate plots for these SNPs independently if needed. Since contigs smaller than the sliding window size often produce less reliable results, excluding them from the analysis should not be an issue.
How to filter for higher-confidence SNPs
If the phenotype is clear and the sequence data is clean, the default settings should already show some results. If you want to focus on higher-confidence SNPs, you can sequence both parents and retain only the SNPs that are clearly 0/0 in one parent and 1/1 in the other in the VCF file, which should produce cleaner results. However, keep in mind that the current version of QTL-seq only allows input for one parent’s sequence. The linked page explains default QTL-seq commands, which involve only one parent in the workflow, but it may still be helpful as a reference when creating VCFs that include both parents.
About multiple testing correction
This function has been deprecated since v2.2.5. We highly recommend running QTL-seq without this function. However, if you would like to use this function, you can use it with versions of QTL-seq older than v2.2.5.