Plasmid Copy Number Estimator

Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.

Introduction

Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.

Requirements

It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder…

Citation

When you use PCNE, please cite Bollini R, Cento V. PCNE: A Tool for Plasmid Copy Number Estimation. Bioinformatics and Biology Insights. 2026;20. doi:10.1177/11779322251410037

Pipeline summary

Input parsing and file preparation
Alignment
(Optional) Alignment filtering
Windowed data generation
(Optional) GC correction
Baseline and plasmid depth estimation
Plasmid Copy Number Estimation
Write output and cleanup

PCNE pipeline

Dependencies

The tool relies on the following softwares, which will be installed automatically by Conda:

BWA (tested with v0.7.18)
Minimap2 (2.3)
Samtools (tested with v1.20)
bedtools (tested with v2.31.1)
R (tested with v4.4.3)
R Packages: readr (v2.1.5), dplyr (v1.1.4), ggplot2(v3.5.2), purrr(v1.0.0)

Installation

Bioconda

Install Plasmid Copy Number Estimator via BioConda

Set up Conda Channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Create a new environment and install:

conda create -n pcne_env -c conda-forge -c bioconda pcne
conda activate pcne_env

Docker

You can use Docker:

docker pull riccabolla/pcne:v3.2.0
docker run riccabolla/pcne:v3.2.0 pcne -h

Ubuntu

sudo apt install -y bwa samtools r-base bedtools bc
R
install.packages(c("readr", "dplyr", "ggplot2", "purrr"))
q()
git clone https://github.com/riccabolla/PCNE.git 
bash PCNE/bin/pcne -h

Quick Usage

#short reads
pcne -c <chromosome.fasta> -p <plasmid.fasta> -r <reads_R1.fastq.gz> -R <reads_R2.fastq.gz> [-t <threads>] [-o <output_prefix>]

#long reads
pcne_long --c <chromosome.fasta> -p <plasmid.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

#with multiple plasmids
pcne_long --c <chromosome.fasta> -p <plasmid_*.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

Command line options

  -c, --chromosome <file>    Path to chromosome FASTA file (Required)  
  -p, --plasmid <file>       Path to one or more plasmid FASTA files (Required)  
                             Use with `--single-plasmid` if file contains one fragmented plasmid  
  -a, --assembly <file>      Path to the assembled genome FASTA file (Required)  
  -C, --chr-list <file>      Path to file containing chromosome contig names (Required)  
  -P, --plasmid-list <file>  Path to file containing plasmid contig names (Required)  
  -r, --reads1 <file>        Path to forward reads (FASTQ) (Mandatory)  
  -R, --reads2 <file>        Path to reverse reads (FASTQ) (Mandatory only for short reads)
  --preset <str>             Minimap2 preset (default: map-ont) # pcne_long only
  --minimap-opts <str>       Minimap2 options (use quotes) (default: OFF) # pcne_long only
  -Q, --min-quality <int>    Minimum mapping quality (MQ) for read filtering (default: OFF)  
  -F, --filter <int>         SAM flag to exclude reads (default: OFF)  
  -l, --plot                 Generate a plot of estimated copy numbers (.png)  
  -s, --single-plasmid       Treat all contigs in `-p` FASTA as one fragmented plasmid  
  --gc-correction            Enable GC-correction
  --gc-frac <float>          Specify LOESS smoothing fraction (default: AUTO)
  --gc-window <int>          Specify windows-size (default: 1000 bp)
  --gc-plot <file>           Generate GC plot
  -t, --threads <int>        Number of threads to use (default: 1)  
  -o, --output <str>         Prefix for output files (default: pcne)  
  -k, --keep-intermediate    Keep intermediate files (default: OFF)  
  -v, --version              Show version information  
  -h, --help                 Show help message

Run the tool

The tool can be use two different inputs:
Mode 1: it requires two separate FASTA files for chromosome and plasmid(s).

#Example Mode 1 for short reads
pcne \ 
  -c my_sample.chromosome.fasta \ 
  -p my_sample.plasmid.fasta \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Mode 2: it requires an assembled FASTA file, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S). The list should be structured as follow:

plasmid1_contig
plasmid2_contig
plasmid3_contig
...

#Example Mode 2 for short reads
pcne \ 
  -a my_sample_assembly.fasta \
  -C chromosome.list \
  -P plasmid.list \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Note: if files are not in the working folder, provide the PATH.

For both modes the main output is a TSV file.
Example output.tsv:

sample	plasmid_contig	plasmid_length	plasmid_depth	chromosome_depth	normalization_mode	estimated_copy_number
isolate_1	plasmid_contig_ 1	54321	152.75	31.45	Default	4.86
isolate_1	plasmid_contig_2_IncFIB	9876	28.50	31.45	Default	0.91
…	…	…	…	…	…	…

Columns:

sample: Name of the output file
plasmid_contig: Name of the plasmid contig (from the input plasmid FASTA).
plasmid_length: Length of the plasmid contig in base pairs.
plasmid_depth: Median plasmid depth.
chromosome_depth: Baseline coverage depth.
normalization mode: how baseline coverage depth was calculated
estimated_copy_number: The calculated copy number (mean_depth / baseline_mean_depth).

Summarizing multiple results

After running pcne in batch on multiple isolates, you can use pcne_summary to combine all results together and generate a summary plot.

cd $working_dir
pcne_summary

This will create two files:

pcne_summary_all_results.tsv
pcne_summary_plot.png

Optional parameters

Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.

–gc-correction

This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).

–min-quality / -Q

This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.

–filter / -F

This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)

–minimap-opts

Allow to use minimap2 optional parameters.

Next features

Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

riccardo.bollini@hunimed.eu

Issues

Please report any issues via the GitHub Issues page.