CirculoCov is a Python tool designed for circular-aware coverage analysis of draft genomes. Alignment is difficult at the beginning and end of linear sequences, so coverage is lower at these positions than the true value. CirculoCov “pads” these sequences by adding the initial portions of the reference to the end.
Circular determination
Draft genomes are input as fasta files, which are not inherently circular. Instead, there must be a “Circular=True” or something similar in the header of the fasta file.
Strings that will indicate a sequence is circular (case insensitive):
circular=true
circ=true
circular=t
circ=t
complete sequence
This tool is designed to
take a draft genome and determine which sequences are circular
map nanopore\ONT, Illumina, and/or Pacbio reads to the draft genome with minimap2
get coverage information with pysam
get depth information with pysam (optional: set with --all or -a)
extract fastq files for each contig (optional: set with --all or -a)
visualize depth for circular and linear sequences (optional: set with --all or -a)
# circulcov and its python dependencies can be installed via pip
pip install circulocov
# minimap2 is not installed via pip
MINIMAP2_VER="2.26"
curl -L https://github.com/lh3/minimap2/releases/download/v${MINIMAP2_VER}/minimap2-${MINIMAP2_VER}_x64-linux.tar.bz2 | tar -jxvf -
NOTE: minimap2 must be in PATH
Usage
circulocov -g draft_genome.fasta -n nanopore.fastq.gz -i illumina1.fastq.gz illumina2.fastq.gz -o out
usage: circulocov [-h] [-s SAMPLE] -g GENOME [-i ILLUMINA [ILLUMINA ...]] [-n NANOPORE] [-p PACBIO] [-a | --all | --no-all] [-d PADDING] [-w WINDOW] [-o OUT] [-log LOGLEVEL] [-t THREADS] [-v]
options:
-h, --help show this help message and exit
-s SAMPLE, --sample SAMPLE
Sample name
-g GENOME, --genome GENOME
Genome (draft or complete)
-i ILLUMINA [ILLUMINA ...], --illumina ILLUMINA [ILLUMINA ...]
Input illumina fastq(s)
-n NANOPORE, --nanopore NANOPORE
Input nanopore fastq
-p PACBIO, --pacbio PACBIO
Input pacbio fastq
-a, --all, --no-all
-d PADDING, --padding PADDING
Amount of padding added to circular sequences
-w WINDOW, --window WINDOW
Number of windows for coverage
-o OUT, --out OUT Result directory
-log LOGLEVEL, --loglevel LOGLEVEL
Logging level
-t THREADS, --threads THREADS
Number of threads to use
-v, --version Print version and exit
Output
The output is
A csv file with each contig broken into windows with their corresponding depths for Illumina and nanopore files
This overall summary is to provide context to how well the assembled genome is supported by the reads. Contigs with few reads are not-as-likely to be real. Assemblies with large numbers of unmapped reads may have contamination or other issues that need to be addressed.
Note: “few” and “large numbers” are not defined in this README.md and are intentionally left to interpretation.
Notes:
Although the examples have both Nanopore/ONT and Illumina reads, only one type of read is required.
There are not currently ways to adjust the image generated. Instead, the depth and coverage files are available as input for other tools and scripts for visualization.
The term ‘windows’ may be a misleading in the case of CirculoCov. In CirculoCov, ‘windows’ are more like snapshots accross the genome at specific positions where the number of positions is equal to ‘windows’. These snapshots, however, are very similar to a sliding window, but take less computation.
The ‘coverage’ values are determined on padded lengths. The default padding length is 10,000 and should have minimal impact on the overall coverage of a large sequence, such as that of a chromosome of a bacterial isolate.
The overall coverage value is the weighted average (weighted by sequence length) of the coverage values of each contig and is not a “true” mean depth value. It’s pretty close, though, and for most intents and purposes fulfills depth determination goals.
Although the intention was for circular draft genomes that were generated from long-read sequencing, Circulocov can also be run on short-read draft genomes. I can’t stop you.
The overall_summary.txt looks like the following for some Illumina reads and a draft genome generated via SPADES
CirculoCov
Circular-Aware Coverage for Draft Genomes
Overview
CirculoCov is a Python tool designed for circular-aware coverage analysis of draft genomes. Alignment is difficult at the beginning and end of linear sequences, so coverage is lower at these positions than the true value. CirculoCov “pads” these sequences by adding the initial portions of the reference to the end.
Circular determination
Draft genomes are input as fasta files, which are not inherently circular. Instead, there must be a “Circular=True” or something similar in the header of the fasta file.
Strings that will indicate a sequence is circular (case insensitive):
This tool is designed to
--allor-a)--allor-a)--allor-a)Requirements
Installation
NOTE: minimap2 must be in PATH
Usage
Output
The output is
Final directory tree:
Examples
Example circular images
Chromosome
Plasmid

Example linear image
Example overall_summary.txt
This overall summary is to provide context to how well the assembled genome is supported by the reads. Contigs with few reads are not-as-likely to be real. Assemblies with large numbers of unmapped reads may have contamination or other issues that need to be addressed.
Note: “few” and “large numbers” are not defined in this README.md and are intentionally left to interpretation.
Notes:
Although the examples have both Nanopore/ONT and Illumina reads, only one type of read is required.
There are not currently ways to adjust the image generated. Instead, the depth and coverage files are available as input for other tools and scripts for visualization.
The term ‘windows’ may be a misleading in the case of CirculoCov. In CirculoCov, ‘windows’ are more like snapshots accross the genome at specific positions where the number of positions is equal to ‘windows’. These snapshots, however, are very similar to a sliding window, but take less computation.
The ‘coverage’ values are determined on padded lengths. The default padding length is 10,000 and should have minimal impact on the overall coverage of a large sequence, such as that of a chromosome of a bacterial isolate.
The overall coverage value is the weighted average (weighted by sequence length) of the coverage values of each contig and is not a “true” mean depth value. It’s pretty close, though, and for most intents and purposes fulfills depth determination goals.
Although the intention was for circular draft genomes that were generated from long-read sequencing, Circulocov can also be run on short-read draft genomes. I can’t stop you.
The overall_summary.txt looks like the following for some Illumina reads and a draft genome generated via SPADES