Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads
to long reference sequences. It is particularly good at aligning reads of about 50
up to 100s or 1,000s of characters, and particularly good at aligning to relatively
long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep
its memory footprint small: for the human genome, its memory footprint is typically
around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Obtaining Bowtie2
Bowtie 2 is available from various package managers, notably
Bioconda. With Bioconda installed, you
should be able to install Bowtie 2 with conda install bowtie2.
Containerized versions of Bowtie 2 are also available via the
Biocontainers project (e.g. via
Docker Hub).
You can also download Bowtie 2 sources and binaries from the
“releases” tab on this page. Binaries are available for the Linux,
Mac OS X, and Windows. By utilizing the SIMDE project
Bowtie 2 now supports the following architectures: ARM64, PPC64, and
s390x. If you plan to compile Bowtie 2 yourself, make sure you at least have
the zlib library and header files installed. See the
Building from source
section of the manual for details.
Getting started
Looking to try out Bowtie 2? Check out the Bowtie 2 UI (currently in beta).
Alignment
bowtie2 takes a Bowtie 2 index and a set of sequencing read files and outputs a
set of alignments in SAM format.
“Alignment” is the process by which we discover how and where the read sequences are
similar to the reference sequence. An “alignment” is a result from this process,
specifically: an alignment is a way of “lining up” some or all of the characters in
the read with some characters from the reference in a way that reveals how they’re
similar. For example:
Where dash symbols represent gaps and vertical bars show where aligned characters match.
We use alignment to make an educated guess as to where a read originated with
respect to the reference genome. It’s not always possible to determine this with
certainty. For instance, if the reference genome contains several long stretches of
As (AAAAAAAAA etc.) and the read sequence is a short stretch of As (AAAAAAA), we
cannot know for certain exactly where in the sea of As the read originated.
bowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build
outputs a set of 6 files with suffixes .1.bt2, .2.bt2, .3.bt2, .4.bt2,
.rev.1.bt2, and .rev.2.bt2. In the case of a large index these suffixes will
have a bt2l termination. These files together constitute the index: they are all
that is needed to align reads to that reference. The original sequence FASTA files
are no longer used by Bowtie 2 once the index is built.
Bowtie 2’s .bt2 index format is different from Bowtie 1’s .ebwt format, and they
are not compatible with each other.
Examples
# Building a small index
bowtie2-build example/reference/lambda_virus.fa example/index/lambda_virus
# Building a large index
bowtie2-build --large-index example/reference/lambda_virus.fa example/index/lambda_virus
Index inpection
bowtie2-inspect extracts information from a Bowtie 2 index about what kind of
index it is and what reference sequences were used to build it. When run without any
options, the tool will output a FASTA file containing the sequences of the original
references (with all non-A/C/G/T characters converted to Ns). It can also be used to
extract just the reference sequence names using the -n/--names option or a more
verbose summary using the -s/--summary option.
Examples
# Inspecting a lambda_virus index (small index) and outputting the summary
bowtie2-inspect --summary example/index/lambda_virus
# Inspecting the entire lambda virus index (large index)
bowtie2-inspect --large-index example/index/lambda_virus
Overview
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Obtaining Bowtie2
Bowtie 2 is available from various package managers, notably Bioconda. With Bioconda installed, you should be able to install Bowtie 2 with
conda install bowtie2.Containerized versions of Bowtie 2 are also available via the Biocontainers project (e.g. via Docker Hub).
You can also download Bowtie 2 sources and binaries from the “releases” tab on this page. Binaries are available for the Linux, Mac OS X, and Windows. By utilizing the SIMDE project Bowtie 2 now supports the following architectures: ARM64, PPC64, and s390x. If you plan to compile Bowtie 2 yourself, make sure you at least have the zlib library and header files installed. See the Building from source section of the manual for details.
Getting started
Looking to try out Bowtie 2? Check out the Bowtie 2 UI (currently in beta).
Alignment
bowtie2takes a Bowtie 2 index and a set of sequencing read files and outputs a set of alignments in SAM format.“Alignment” is the process by which we discover how and where the read sequences are similar to the reference sequence. An “alignment” is a result from this process, specifically: an alignment is a way of “lining up” some or all of the characters in the read with some characters from the reference in a way that reveals how they’re similar. For example:
Where dash symbols represent gaps and vertical bars show where aligned characters match.
We use alignment to make an educated guess as to where a read originated with respect to the reference genome. It’s not always possible to determine this with certainty. For instance, if the reference genome contains several long stretches of As (
AAAAAAAAAetc.) and the read sequence is a short stretch of As (AAAAAAA), we cannot know for certain exactly where in the sea of As the read originated.Examples
Building an index
bowtie2-buildbuilds a Bowtie index from a set of DNA sequences.bowtie2-buildoutputs a set of 6 files with suffixes.1.bt2,.2.bt2,.3.bt2,.4.bt2,.rev.1.bt2, and.rev.2.bt2. In the case of a large index these suffixes will have abt2ltermination. These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence FASTA files are no longer used by Bowtie 2 once the index is built.Bowtie 2’s
.bt2index format is different from Bowtie 1’s.ebwtformat, and they are not compatible with each other.Examples
Index inpection
bowtie2-inspectextracts information from a Bowtie 2 index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the-n/--namesoption or a more verbose summary using the-s/--summaryoption.Examples
Publications
Bowtie 2 Papers
Langmead B, Wilks C., Antonescu V., Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. bty648.
Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10:R25.
Related Publications
Related Work
Check out the Bowtie 2 UI, a shiny, frontend to the Bowtie 2 command line.