ccne: Carbapenemase-encoding gene Copy Number Estimator
Introduction
Carbapenemase-encoding gene Copy Number Estimator (ccne) is a tool to estimate the copy number of AMR genes. It uses housekeeping gene as the reference and compares the count of reads that mapped to AMR genes with the count of reads that mapped to the reference gene.
Citation
Jiang J, Chen L,Chen X, Li P, Xu X, Fowler VG, van Duin D, Wang M.2022.Carbapenemase-Encoding Gene Copy Number Estimator (CCNE): a Tool for Carbapenemase Gene Copy Number Estimation. Microbiol Spectr10:e01000-22. https://doi.org/10.1128/spectrum.01000-22
Quick start
ccne-fast
$ ccne-fast --amr KPC-2 --sp Kpn --in File.list --out result.txt --cpus 4
All finished! Enjoy!
$ ls
File.list result.txt SRR14561347_1.fastq.gz SRR14561347_2.fastq.gz
$ head File.list
SRR14561347 ./SRR14561347_1.fastq.gz ./SRR14561347_2.fastq.gz
$ head result.txt
ID rpoB reads depth SD of rpoB reads depth KPC-2 reads depth SD of KPC-2 reads depth Estimated KPC-2 copy number
SRR14561347 653.944899478779 53.7865295303472 2006.6179138322 96.5807513871426 3.06848163420428
ccne-acc
$ ccne-acc --amr KPC-2 --in File.list --out result.txt --cpus 4
All finished! Enjoy!
$ ls
File.list result.txt SRR14561347_1.fastq.gz SRR14561347_2.fastq.gz SRR14561347.fasta
$ head File.list
SRR14561347 ./SRR14561347_1.fastq.gz ./SRR14561347_2.fastq.gz SRR14561347.fasta
$ head result.txt
ID Average reference reads depth KPC-2 reads depth Estimated KPC-2 copy number
SRR14561347 570 2127 3.73157894736842
Installation
Bioconda
If you use Conda you can use the Bioconda channel:
$ cd $HOME
$ git clone https://github.com/biojiang/ccne.git
$ $HOME/ccne/bin/ccne --help
Check installation
Check the ccne version:
$ ccne --version
Check dependencies:
The ccne will check the dependencies automatically each time before running.
Usage
ccne-fast
Name:
ccne-fast 1.1.0 by Jianping Jiang <jiangjianping@fudan.edu.cn>
Synopsis:
Carbapenemase-encoding gene copy number estimator
Usage:
ccne-fast --amr KPC-2 --sp Kpn --in File.list --out result.txt
General:
--help This help
--version Print version and exit
--quiet No screen output (default OFF)
Setup:
--dbdir [X] CCNE database root folders (default '$CCNE_bin/db')
--listdb List all configured AMRs
--listsp List all configured species and housekeeping genes
--fmtdb Format all the bwa index
Input:
--amr [X] AMR gene name, such as KPC-2, NDM-1, etc or AMR ID. Please refer to --listdb (required)
--sp [X] Species name[Kpn|Eco|Aba|Pae|Pls|...]. Please refer to --listsp. (required)
--ref [X] Reference gene defalut(such as Kpn:rpoB Aba:rpoB Eco:polB Pae:pps), please refer to --listsp. Note: When --sp is set to Pls, this parameter should be set to a replicon type.
--in [X] Input file name (required)
Outputs:
--out [X] Output file name (required)
Computation:
--flank [N] The flanking length of sequence to be excluded (default '0')
--cpus [N] Number of CPUs to use (default '1')
--multiref Use the reads depth of all the available sequences (default OFF)
ccne-acc
Name:
ccne-acc 1.1.0 by Jianping Jiang <jiangjianping@fudan.edu.cn>
Synopsis:
Carbapenemase-encoding gene copy number estimator
Usage:
ccne-acc --amr KPC-2 --in File.list --out result.txt
General:
--help This help
--version Print version and exit
--quiet No screen output (default OFF)
Setup:
--dbdir [X] CCNE database root folders (default '$CCNE_bin/db')
--listdb List all configured AMRs
--fmtdb Format all the bwa index
Input:
--amr [X] AMR gene name, such as KPC-2, NDM-1, etc or AMR ID. Please refer to --listdb (required)
--in [X] Input file name (required)
Outputs:
--out [X] Output file name (required)
Computation:
--cpus [N] Number of CPUs to use (default '1')
Running
Input Requirements
ccne-fast
Sequence read file(s) in FASTQ format (can be .gz compressed) format
AMR gene name (refer to –listdb)
The species code ( refer to –listsp or species code table)
ccne-acc
Sequence read file(s) in FASTQ format (can be .gz compressed) format
AMR gene name (refer to –listdb)
The genome assembly
Output File
The ccne will output the result to the file with the name the user provided.
Columns in the output file
ccne-fast
Name|Description
|:—|:–
|ID|The sample ID user provided in the input file
|rpoB reads coverage|The estimated reads coverage of the input reference housekeeping gene
|SD of rpoB reads coverage|The standard deviation of rpoB reads coverage
|KPC-2 reads coverage|The estimated reads coverage of the input carbapenemase-encoding gene
|SD of KPC-2 reads coverage|The standard deviation of KPC-2 reads coverage
|Estimated KPC-2 copy number|Divide the reads coverage of AMR gene into that of housekeeping gene
ccne-acc
Name|Description
|:—|:–
|ID|The sample ID user provided in the input file
|Average reads coverage|The estimated reads coverage of the genome
|KPC-2 reads coverage|The estimated reads coverage of the input carbapenemase-encoding gene
|Estimated KPC-2 copy number|Divide the reads coverage of AMR gene into that of housekeeping gene
Tutorial
Fetch the reads files (SRR14561347) in fastq format from NCBI SRA database. (SRR14561347 generated from a Klebsiella pneumoinae clinical isolate with triple KPC-2 encoding genes on the plasmid)
$ fasterq-dump --split-3 SRR14561347
Copy input file templete from templete folder.
$ cp ./templete/templete.list File.list
Modify the input file.
$ head File.list
SRR14561347 ./SRR14561347_1.fastq.gz ./SRR14561347_2.fastq.gz
Species in blod are most commonly isolated species in clinical. (Data come from the CHINET)
Q&A
How the reference genes in ccne were determined?
The reference genes in ccne are single copy housekeeping genes. If the species has been curated in pubMLST, then the hosuekeeping genes in pubMLST will be used. Usually, the fisrt allele will be used as the reference in ccne. Otherwise, the reference genome and gene model of the species will be download from NCBI GenBank. The busco is used to determine the single copy genes. By reads simulation and mapping, the top 10 single copy genes with the lowest RMSE (the comparison between base pair wise read depths and simulated read depth) are selected as the reference genes.
How to interpret the SD of reads depth?
The SD of reads depth is the standard deviation of reads depth of a gene. If this value is large, then the reads depth will be unreliable. In our practical analysis, the SD of reads depth is usually less than 15% of the mean.
“error while loading shared libraries: libncurses.so.5”
sudo apt install libncurses5
“blastn: error while loading shared libraries: libidn.so.11”
sudo apt install libidn11-dev
“UnsatisfiableError”
```
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:
feature:/linux-64::__glibc==2.23=0
feature:|@/linux-64::__glibc==2.23=0
Your installed version is: 2.23”
Install [htstream](https://s4hts.github.io/HTStream/) first.
# Dependencies
## ccne-fast & ccne-acc
* **HTStream**</br>
Used for raw reads QC</br>
*Petersen, Kristen R., David A. Streett, et al., 2015, Super deduper, fast PCR duplicate detection in fastq files. In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 491-492.*
* **bwa**</br>
Used for reads mapping</br>
*Li H. and Durbin R., 2009, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25:1754-1760.* [PMID: [19451168](http://www.ncbi.nlm.nih.gov/pubmed/19451168)]
* **samtools**</br>
Used for fetching mapped reads and sorting them by locus</br>
*Li H., Handsaker B. et al., 2009, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics, 25(16):2078-9.* [PMID:[19505943](http://www.ncbi.nlm.nih.gov/pubmed/19505943)]
* **bedtools**</br>
Used for getting bed files</br>
*Quinlan R A. and Hall M I., 2010, BEDTools: a flexible suite of utilities for comparing genomic features, Bioinformatics, 26(6):841-2.* [PMID:[20110278](https://pubmed.ncbi.nlm.nih.gov/20110278)]
* **deepTools**&</br>
Used for getting bed files</br>
[Installation](https://deeptools.readthedocs.io/en/develop/content/installation.html)</br>
*Ramírez, Fidel, Devon P. Ryan, et al., 2016, deepTools2: A next Generation Web Server for Deep-Sequencing Data Analysis, Nucleic Acids Research.* [PMID:[27079975](https://pubmed.ncbi.nlm.nih.gov/27079975)] </br>
Requirements for deepTools:
- Python 2.7 or Python 3.x
- numpy >= 1.8.0
- scipy >= 0.17.0
- py2bit >= 0.1.0
- pyBigWig >= 0.2.1
- pysam >= 0.8
- matplotlib >= 1.4.0
* **blast**&</br>
Used for finding AMR gene on the genome</br>
*Camacho C., Coulouris G., et al., 2008, BLAST+: architecture and applications, BMC Bioinformatics, 0:42.* [PMID:[20003500](https://pubmed.ncbi.nlm.nih.gov/20003500)]
* **perl Math::CDF**&</br>
Used for estimating AMR CN</br>
[Math::CDF](https://metacpan.org/pod/Math::CDF)
<p>& Only required for ccne-acc, the others are required for both ccne-fast and ccne-acc.</p>
# Test environment
Ubuntu 16.04 LTS with perl v5.26.2 (Theoretically compatible with other generic Linux version but not tested)
# Bundled binaries
For Linux (compiled on Ubuntu 16.04 LTS) some of the binaries are included.
# Licence
* ccne is free software, released under the [GPL V3](https://github.com/biojiang/ccne/blob/main/LICENSE)
# Author
* Jiang Jianping
* Institute of Antibiotics, Huashan hospital, Fudan University
* jiangjianping@fudan.edu.cn
ccne: Carbapenemase-encoding gene Copy Number Estimator
Introduction
Carbapenemase-encoding gene Copy Number Estimator (ccne) is a tool to estimate the copy number of AMR genes. It uses housekeeping gene as the reference and compares the count of reads that mapped to AMR genes with the count of reads that mapped to the reference gene.
Citation
Jiang J, Chen L,Chen X, Li P, Xu X, Fowler VG, van Duin D, Wang M.2022.Carbapenemase-Encoding Gene Copy Number Estimator (CCNE): a Tool for Carbapenemase Gene Copy Number Estimation. Microbiol Spectr10:e01000-22. https://doi.org/10.1128/spectrum.01000-22
Quick start
ccne-fast
ccne-acc
Installation
Bioconda

If you use Conda you can use the Bioconda channel:
Too slow? Try mamba
Source
Install the latest version direct from Github.
Check installation
Check the ccne version:
Check dependencies:
The ccne will check the dependencies automatically each time before running.
Usage
ccne-fast
ccne-acc
Running
Input Requirements
ccne-fast
ccne-acc
Output File
The ccne will output the result to the file with the name the user provided.Columns in the output file
ccne-fast
Name|Description |:—|:– |ID|The sample ID user provided in the input file |rpoB reads coverage|The estimated reads coverage of the input reference housekeeping gene |SD of rpoB reads coverage|The standard deviation of rpoB reads coverage |KPC-2 reads coverage|The estimated reads coverage of the input carbapenemase-encoding gene |SD of KPC-2 reads coverage|The standard deviation of KPC-2 reads coverage |Estimated KPC-2 copy number|Divide the reads coverage of AMR gene into that of housekeeping geneccne-acc
Name|Description |:—|:– |ID|The sample ID user provided in the input file |Average reads coverage|The estimated reads coverage of the genome |KPC-2 reads coverage|The estimated reads coverage of the input carbapenemase-encoding gene |Estimated KPC-2 copy number|Divide the reads coverage of AMR gene into that of housekeeping geneTutorial
AMR genes (2412 genes) in ccne
1Numbers in the last brackets are the number of alleles. Details refer to CARD AMR genes in Kleborate.
Supported species in ccne-fast
Species codes and default housekeeping genes
Codes in ccne|Species|Default housekeeping genes |:—|:—|:— Aba|Acinetobacter baumannii|rpoB Ach|Achromobacter spp.|rpoB Aer|Aeromonas spp.|gltA Arc|Arcobacter spp.|gltA Aap|Anaplasma aphagocytophilum|atpA Bcc|Burkholderia cepacia|atpD Bce|Bacillus cereus|glp Bha|Brachyspira hampsonii|adh Bhe|Bartonella henselae|rpoB Bhy|Brachyspira hyodysenteriae|adh Bin|Brachyspira intermedia|adh Bli|Bacillus licheniformis|rpoB Bpe|Bordetella pertussis|adk Bor|Borrelia spp.|clpA Bpi|Brachyspira pilosicoli|adh Bps|Burkholderia pseudomallei|ace Bra|Brachyspira spp.|adh Bsu|Bacillus subtilis|glpF Cco|Campylobacter coli|aspA Cje|Campylobacter jejuni|aspA Cbo|Clostridium botulinum|rpoB Ccn|Campylobacter concisus|aspA Cdi|Clostridium difficile|adk Pdi|Peptoclostridium difficile|adk Cdp|Corynebacterium diphtheriae|rpoB Cfe|Campylobacter fetus|aspA Cfr|Citrobacter freundii|mdh Che|Campylobacter helveticus|aspA Chl|Chlamydia spp.|enoA Chy|Campylobacter hyointestinalis|aspA Cin|Campylobacter insulaenigrae|aspA Cla|Campylobacter lanienae|aspA Clr|Campylobacter lari|adk Cma|Carnobacterium maltaromaticum|dapE Cro|Cronobacter spp.|atpD Cse|Clostridium septicum|ddl Csp|Campylobacter sputorum|aspA Cup|Campylobacter upsaliensis|adk Ecl|Enterobacter cloacae|rpoB Eco|Escherichia spp.|rpoB Shi|Shigella spp.|rpoB Eta|Edwardsiella tarda|adk Efa|Enterococcus faecalis|aroE Efm|Enterococcus faecium|adk Fps|Flavobacterium psychrophilum|atpA Hci|Helicobacter cinaedi|aroE Hin|Haemophilus influenzae|adk Hpa|Haemophilus parasuis|rpoB Hpy|Helicobacter pylori|atpA Hsu|Haematopinus suis|atpA Kki|Kingella kingae|abcZ Kox|Klebsiella oxytoca|rpoB Kpn|Klebsiella pneumoniae|rpoB Kae|Klebsiella aerogenes|rpoB Lep|Leptospira spp.|adk Lmo|Listeria monocytogenes|abcZ Lsa|Lactobacillus salivarius|nrdB Mab|Mycobacterium abscessus|rpoB Mag|Mycoplasma agalactiae|dnaA Mbo|Mycoplasma bovis|adh1 Mca|Moraxella catarrhalis|abcZ Mha|Mannheimia haemolytica|adk Mhy|Mycoplasma hyorhinis|rpoB Mma|Mycobacterium massiliense|rpoB Mpl|Melissococcus plutonius|argE Nei|Neisseria spp.|abcZ Orh|Ornithobacterium rhinotracheale|mdh Ots|Orientia tsutsugamushi|mdh Pac|Propionibacterium acnes|aroE Pae|Pseudomonas aeruginosa|ppsA Pfl|Pseudomonas fluorescens|glnS Pgi|Porphyromonas gingivalis|ftsQ Pla|Paenibacillus larvae|rpoB Pmu|Pasteurella multocida|adk Ppe|Pediococcus pentosaceus|dalR Ran|Riemerella anatipestifer|rpoB Sag|Streptococcus agalactiae|adhP Pau|Staphylococcus aureus|arcC Sca|Streptococcus canis|gki Sdy|Streptococcus dysgalactiae|atoB Sen|Salmonella enterica|aroC Sep|Staphylococcus epidermidis|arcC Sga|Streptococcus gallolyticus|aroE Sha|Staphylococcus haemolyticus|arcC Sho|Stapylococcus hominis|arcC Sin|Sinorhizobium spp.|asd Slu|Staphylococcus lugdunensis|aroE Sma|Stenotrophomonas maltophilia|atpD Sor|Streptococcus oralis|aroE Spn|Streptococcus pneumoniae|aroE Sps|Staphylococcus pseudintermedius|ack Spy|Streptococcus pyogenes|gki Ssu|Streptococcus suis|aroA Sth|Streptococcus thermophilus|rpoB Str|Streptomyces spp.|atpD Sub|Streptococcus uberis|arcC Seq|Streptococcus equi|arcC Tay|Taylorella spp.|adk Ten|Tenacibaculum spp.|atpA Vch|Vibrio cholerae|adk Vib|Vibrio spp.|atpA Vpa|Vibrio parahaemolyticus|dnaE Vta|Vibrio tapetis|atpA Vvu|Vibrio vulnificus|dtdS Wol|Wolbachia spp.|coxA Xfa|Xylella fastidiosa|cysG Yer|Yersinia spp.|aarF Yps|Yersinia pseudotuberculosis|adk Yru|Yersinia ruckeri|glnA Smr|Serratia marcescens|rpoC Pmi|Proteus mirabilis|rpoB
Species in blod are most commonly isolated species in clinical. (Data come from the CHINET)
Q&A
How the reference genes in ccne were determined?
The reference genes in ccne are single copy housekeeping genes. If the species has been curated in pubMLST, then the hosuekeeping genes in pubMLST will be used. Usually, the fisrt allele will be used as the reference in ccne. Otherwise, the reference genome and gene model of the species will be download from NCBI GenBank. The busco is used to determine the single copy genes. By reads simulation and mapping, the top 10 single copy genes with the lowest RMSE (the comparison between base pair wise read depths and simulated read depth) are selected as the reference genes.
How to interpret the SD of reads depth?
The SD of reads depth is the standard deviation of reads depth of a gene. If this value is large, then the reads depth will be unreliable. In our practical analysis, the SD of reads depth is usually less than 15% of the mean.
“error while loading shared libraries: libncurses.so.5”
sudo apt install libncurses5
“blastn: error while loading shared libraries: libidn.so.11”
sudo apt install libidn11-dev
“UnsatisfiableError” ``` UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versionsThe following specifications were found to be incompatible with your system:
Your installed version is: 2.23”