Tool for generating synthetic next-generation sequencing reads with same nucleotide distribution as model reads.
Usage
boquila 0.6.1
Generate NGS reads with same nucleotide distribution as input file
Generated reads will be written to stdout
By default input and output format is FASTQ
USAGE:
boquila [FLAGS] [OPTIONS] <src>
ARGS:
<src> Model file
FLAGS:
--fasta Change input and output format to FASTA
--setQual Use given Quality score with parameter 'qual' for all simulated reads.
-h, --help Print help information
--inseqFasta Change the input sequencing format to FASTA
-V, --version Print version information
OPTIONS:
--bed <FILE> File name in which the simulated reads will be saved in BED format
--inseq <FILE> Input sequencing reads to be used instead of reference genome
--kmer <INT> Kmer size to be used while calculating frequency [default: 1]
--ref <FILE> Reference FASTA
--regions <FILE> RON formatted file containing genomic regions that generated reads will
be selected from
--seed <INT> Random number seed. If not provided system's default source of entropy
will be used instead.
--sens <INT> Sensitivity of selected reads.
If some positions are predominated by specific nucleotides, increasing
this value can make simulated reads more similar to input reads.
However runtime will also increase linearly.
[possible values: 10-100] [default: 20]
--qual <QUAL> Quality score to be applied to to each position for all reads.
'setQual' flag should be present in order it to work
Has no effect if input reads are not in FASTQ format. [default: I]
Generated reads will be written to stdout in FASTA or FASTQ format.
If --bed option is provided, generated reads also will be written to given file in BED6 format.
Sample regions file for Homo sapiens (human) genome assembly GRCh38 (hg38) is provided as GRCh38.ron
If Input Sequencing reads will be used for simulation, they should be provided with --inseq argument, instead of using --ref and --regions.
Examples
More detailed example can be found in the examples directory
boquila
Tool for generating synthetic next-generation sequencing reads with same nucleotide distribution as model reads.
Usage
Generated reads will be written to stdout in FASTA or FASTQ format.
If
--bedoption is provided, generated reads also will be written to given file inBED6format.Sample
regionsfile for Homo sapiens (human) genome assembly GRCh38 (hg38) is provided asGRCh38.ronIf Input Sequencing reads will be used for simulation, they should be provided with
--inseqargument, instead of using--refand--regions.Examples
Simple usage
Using seed for RNG
Using reads that are in FASTA format
Saving output in BED format
Using Input Sequencing instead of reference genome
Using Input Sequencing reads which are in FASTA format
Installation
Or via Rust toolchain
boquila is written in Rust, so you’ll need to grab a Rust installation in order to install or compile it.
The current minimum Rust version is
1.55.0cargoCargo will build and install the binary, by default to
$HOME/.cargo/bin/For convenience, you can copy the executable
./target/release/boquilato some directory in yourPATH.