目录

Version Conda Downloads Anaconda-Server Badge Anaconda-Server Badge

Plasmid Copy Number Estimator

Plasmid Copy Number Estimator (PCNE) is a simple tool to estimate the copy number of plasmid from an assembled genome.

Introduction

Determining the copy number of plasmids is essential for understanding plasmid biology, evolution, and the dosage of plasmid-borne genes (e.g., antimicrobial resistance genes). PCNE automates this estimation from standard sequencing file formats.

Requirements

It requires either pre-separated chromosome and plasmid FASTA files or a complete genome assembly FASTA with corresponding contig lists. It also allows the use of a multi-fasta file with one contig per plasmid, a complete assembled plasmid (1 contig), or a draft assembled plasmid (one plasmid with multiple contigs).
You can esily get them using tools like Platon, MOB-Suite, PlasmidFinder

Citation

When you use PCNE, please cite Bollini R, Cento V. PCNE: A Tool for Plasmid Copy Number Estimation. Bioinformatics and Biology Insights. 2026;20. doi:10.1177/11779322251410037

Pipeline summary

  1. Input parsing and file preparation
  2. Alignment
  3. (Optional) Alignment filtering
  4. Windowed data generation
  5. (Optional) GC correction
  6. Baseline and plasmid depth estimation
  7. Plasmid Copy Number Estimation
  8. Write output and cleanup

Dependencies

The tool relies on the following softwares, which will be installed automatically by Conda:

  1. BWA (tested with v0.7.18)
  2. Minimap2 (2.3)
  3. Samtools (tested with v1.20)
  4. bedtools (tested with v2.31.1)
  5. R (tested with v4.4.3)
  6. R Packages: readr (v2.1.5), dplyr (v1.1.4), ggplot2(v3.5.2), purrr(v1.0.0)

Installation

Bioconda install with bioconda

Install Plasmid Copy Number Estimator via BioConda

  1. Set up Conda Channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
  1. Create a new environment and install:
    conda create -n pcne_env -c conda-forge -c bioconda pcne
    conda activate pcne_env

    Docker Static Badge

    You can use Docker:
docker pull riccabolla/pcne:v3.2.0
docker run riccabolla/pcne:v3.2.0 pcne -h

Ubuntu

sudo apt install -y bwa samtools r-base bedtools bc
R
install.packages(c("readr", "dplyr", "ggplot2", "purrr"))
q()
git clone https://github.com/riccabolla/PCNE.git 
bash PCNE/bin/pcne -h

Quick Usage

#short reads
pcne -c <chromosome.fasta> -p <plasmid.fasta> -r <reads_R1.fastq.gz> -R <reads_R2.fastq.gz> [-t <threads>] [-o <output_prefix>]

#long reads
pcne_long --c <chromosome.fasta> -p <plasmid.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

#with multiple plasmids
pcne_long --c <chromosome.fasta> -p <plasmid_*.fasta> -r <reads.fastq.gz> [-t <threads>] [-o <output_prefix>]

Command line options

  -c, --chromosome <file>    Path to chromosome FASTA file (Required)  
  -p, --plasmid <file>       Path to one or more plasmid FASTA files (Required)  
                             Use with `--single-plasmid` if file contains one fragmented plasmid  
  -a, --assembly <file>      Path to the assembled genome FASTA file (Required)  
  -C, --chr-list <file>      Path to file containing chromosome contig names (Required)  
  -P, --plasmid-list <file>  Path to file containing plasmid contig names (Required)  
  -r, --reads1 <file>        Path to forward reads (FASTQ) (Mandatory)  
  -R, --reads2 <file>        Path to reverse reads (FASTQ) (Mandatory only for short reads)
  --preset <str>             Minimap2 preset (default: map-ont) # pcne_long only
  --minimap-opts <str>       Minimap2 options (use quotes) (default: OFF) # pcne_long only
  -Q, --min-quality <int>    Minimum mapping quality (MQ) for read filtering (default: OFF)  
  -F, --filter <int>         SAM flag to exclude reads (default: OFF)  
  -l, --plot                 Generate a plot of estimated copy numbers (.png)  
  -s, --single-plasmid       Treat all contigs in `-p` FASTA as one fragmented plasmid  
  --gc-correction            Enable GC-correction
  --gc-frac <float>          Specify LOESS smoothing fraction (default: AUTO)
  --gc-window <int>          Specify windows-size (default: 1000 bp)
  --gc-plot <file>           Generate GC plot
  -t, --threads <int>        Number of threads to use (default: 1)  
  -o, --output <str>         Prefix for output files (default: pcne)  
  -k, --keep-intermediate    Keep intermediate files (default: OFF)  
  -v, --version              Show version information  
  -h, --help                 Show help message 

Run the tool

The tool can be use two different inputs:
Mode 1: it requires two separate FASTA files for chromosome and plasmid(s).

#Example Mode 1 for short reads
pcne \ 
  -c my_sample.chromosome.fasta \ 
  -p my_sample.plasmid.fasta \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Mode 2: it requires an assembled FASTA file, a list file with contig(s) assigned to chromosome, and a list file of contig(s) assigned to plasmid(S). The list should be structured as follow:

plasmid1_contig
plasmid2_contig
plasmid3_contig
...
#Example Mode 2 for short reads
pcne \ 
  -a my_sample_assembly.fasta \
  -C chromosome.list \
  -P plasmid.list \ 
  -r my_sample_R1.fastq.gz \ 
  -R my_sample_R2.fastq.gz \ 
  -t 8 \ 
  -o my_sample_pcne

Note: if files are not in the working folder, provide the PATH.

For both modes the main output is a TSV file.
Example output.tsv:

sample plasmid_contig plasmid_length plasmid_depth chromosome_depth normalization_mode estimated_copy_number
isolate_1 plasmid_contig_ 1 54321 152.75 31.45 Default 4.86
isolate_1 plasmid_contig_2_IncFIB 9876 28.50 31.45 Default 0.91

Columns:

  • sample: Name of the output file
  • plasmid_contig: Name of the plasmid contig (from the input plasmid FASTA).
  • plasmid_length: Length of the plasmid contig in base pairs.
  • plasmid_depth: Median plasmid depth.
  • chromosome_depth: Baseline coverage depth.
  • normalization mode: how baseline coverage depth was calculated
  • estimated_copy_number: The calculated copy number (mean_depth / baseline_mean_depth).

Summarizing multiple results

After running pcne in batch on multiple isolates, you can use pcne_summary to combine all results together and generate a summary plot.

cd $working_dir
pcne_summary

This will create two files:

  • pcne_summary_all_results.tsv
  • pcne_summary_plot.png

Optional parameters

Optional parameters are designed to enhance overall accuracy, especially under challenging or non-ideal conditions. Each parameter is tunable, allowing the user to find the best combination to fit their data.

–gc-correction

This flag enables a model-based correction for GC content bias in sequencing data.
Use this option if you suspect your sequencing data may have GC bias, which is common for libraries prepared with PCR amplification steps. If you are using a PCR-free workflow or your control data shows a very flat GC-to-depth profile, this step may not be necessary.
You may skip this step when using long-reads (pcne_long).

–min-quality / -Q

This sets the minimum mapping quality (MAPQ) for a read to be included in the analysis. A high score means high confidence; a low score means the read could have aligned equally well to multiple different locations.
Use this to filter out ambiguously mapped reads.

–filter / -F

This sets the SAM flag used to filter out reads. Use this to exclude reads with undesirable properties (ex. PCR artifacts)

–minimap-opts

Allow to use minimap2 optional parameters.

Next features

Currently, no major updates are expected.
However, the tool is actively maintained, so it may change in the future.
For any suggetions, please use the GitHub Issues page.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

riccardo.bollini@hunimed.eu

Issues

Please report any issues via the GitHub Issues page.

关于

用于根据组装基因组估计已检测质粒拷贝数的生物信息学工具。

830.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号