Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.
Introduction
Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.
Requirements
It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder…
-c, --chromosome <file> Path to chromosome FASTA file (Required)
-p, --plasmid <file> Path to one or more plasmid FASTA files (Required)
Use with `--single-plasmid` if file contains one fragmented plasmid
-a, --assembly <file> Path to the assembled genome FASTA file (Required)
-C, --chr-list <file> Path to file containing chromosome contig names (Required)
-P, --plasmid-list <file> Path to file containing plasmid contig names (Required)
-r, --reads1 <file> Path to forward reads (FASTQ) (Mandatory)
-R, --reads2 <file> Path to reverse reads (FASTQ) (Mandatory only for short reads)
--preset <str> Minimap2 preset (default: map-ont) # pcne_long only
--minimap-opts <str> Minimap2 options (use quotes) (default: OFF) # pcne_long only
-Q, --min-quality <int> Minimum mapping quality (MQ) for read filtering (default: OFF)
-F, --filter <int> SAM flag to exclude reads (default: OFF)
-l, --plot Generate a plot of estimated copy numbers (.png)
-s, --single-plasmid Treat all contigs in `-p` FASTA as one fragmented plasmid
--gc-correction Enable GC-correction
--gc-frac <float> Specify LOESS smoothing fraction (default: AUTO)
--gc-window <int> Specify windows-size (default: 1000 bp)
--gc-plot <file> Generate GC plot
-t, --threads <int> Number of threads to use (default: 1)
-o, --output <str> Prefix for output files (default: pcne)
-k, --keep-intermediate Keep intermediate files (default: OFF)
-v, --version Show version information
-h, --help Show help message
Run the tool
The tool can be use two different inputs: Mode 1: it requires two separate FASTA files for chromosome and plasmid(s).
Mode 2: it requires an assembled FASTA file, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S).
The list should be structured as follow:
#Example Mode 2 for short reads
pcne \
-a my_sample_assembly.fasta \
-C chromosome.list \
-P plasmid.list \
-r my_sample_R1.fastq.gz \
-R my_sample_R2.fastq.gz \
-t 8 \
-o my_sample_pcne
Note: if files are not in the working folder, provide the PATH.
For both modes the main output is a TSV file.
Example output.tsv:
sample
plasmid_contig
plasmid_length
plasmid_depth
chromosome_depth
normalization_mode
estimated_copy_number
isolate_1
plasmid_contig_ 1
54321
152.75
31.45
Default
4.86
isolate_1
plasmid_contig_2_IncFIB
9876
28.50
31.45
Default
0.91
…
…
…
…
…
…
…
Columns:
sample: Name of the output file
plasmid_contig: Name of the plasmid contig (from the input plasmid FASTA).
plasmid_length: Length of the plasmid contig in base pairs.
plasmid_depth: Median plasmid depth.
chromosome_depth: Baseline coverage depth.
normalization mode: how baseline coverage depth was calculated
estimated_copy_number: The calculated copy number (mean_depth / baseline_mean_depth).
Summarizing multiple results
After running pcne in batch on multiple isolates, you can use pcne_summary to combine all results together and generate a summary plot.
cd $working_dir
pcne_summary
This will create two files:
pcne_summary_all_results.tsv
pcne_summary_plot.png
Optional parameters
Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.
–gc-correction
This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).
–min-quality / -Q
This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.
–filter / -F
This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)
–minimap-opts
Allow to use minimap2 optional parameters.
Next features
Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Plasmid Copy Number Estimator
Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.
Introduction
Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.
Requirements
It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder…
Citation
When you use PCNE, please cite Bollini R, Cento V. PCNE: A Tool for Plasmid Copy Number Estimation. Bioinformatics and Biology Insights. 2026;20. doi:10.1177/11779322251410037
Pipeline summary
Dependencies
The tool relies on the following softwares, which will be installed automatically by Conda:
Installation
Bioconda
Install Plasmid Copy Number Estimator via BioConda
Docker
You can use Docker:Ubuntu
Quick Usage
Command line options
Run the tool
The tool can be use two different inputs:
Mode 1: it requires two separate
FASTAfiles for chromosome and plasmid(s).Mode 2: it requires an assembled
FASTAfile, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S). The list should be structured as follow:Note: if files are not in the working folder, provide the PATH.
For both modes the main output is a
TSVfile.Example
output.tsv:Columns:
Summarizing multiple results
After running pcne in batch on multiple isolates, you can use
pcne_summaryto combine all results together and generate a summary plot.This will create two files:
pcne_summary_all_results.tsvpcne_summary_plot.pngOptional parameters
Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.
–gc-correction
This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).
–min-quality / -Q
This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.
–filter / -F
This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)
–minimap-opts
Allow to use minimap2 optional parameters.
Next features
Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Contact
riccardo.bollini@hunimed.eu
Issues
Please report any issues via the GitHub Issues page.