目录

Gitpod ready-to-code

sccmec - A tool for typing SCCmec cassettes in assemblies

sccmec

sccmec is a tool for typing SCCmec cassettes in assemblies. It was designed to be easy to use. Unlike its predecessor, staphopia-sccmec, sccmec is much simpler to maintain and update. This is because of camlhmp which allows a organization to be defined in a YAML file.

Contributing

If you would like to become a curator for sccmec, please let me know! This could be in the form of adding new SCCmec types, updating existing ones, or adjusting thresholds. I’m open to any and all suggestions!

Supported SCCmec Types

The following SCCmec types are supported by sccmec.

Type Citation
I Katayama et al. 2000
II Katayama et al. 2000, Ito et al. 2001
III Katayama et al. 2000
IV Ma et al. 2002
V Ito et al. 2004
VI Oliveira et al. 2006
VII Berglund et al. 2008
VIII Zhang et al. 2009
IX Li et al. 2011
X Li et al. 2011
XI García-Álvarez et al. 2011
XII Wu et al. 2015
XIII Baig et al. 2018
XIV Urushibara et al. 2020
XV Wang et al. 2022

The following SCCmec subtypes are supported by sccmec.

SubType Citation
Ia Ito et al. 2001
Ib Han et al. 2009, Oliveira et.al. 2006
IIa Katayama et al. 2000, Ito et al. 2001
IIb Hisata et al. 2005
IIc Shore et al. 2005
IId Kondp et al. 2007
IIe Han et al. 2009
IVa Ma et al. 2002
IVb Ma et al. 2002
IVc Ma et al. 2006
IVd Ma et al. 2006
IVg Kwon et al. 2005
IVh Milheirico et al. 2007
IVi Berglund et al. 2009
IVj Berglund et al. 2009
IVk -
IVl Iwao et al. 2012
IVm Hosoya et al. 2014
IVn -
Va Ito et al. 2004
Vb Hisata et al. 2011
Vc Li et al. 2011

Installation

You can install sccmec using conda:

conda create -n sccmec -c conda-forge -c bioconda sccmec
conda activate sccmec
sccmec --help

Note: sccmec is utilizes the API from camlhmp with the defaults for --yaml-targets, --yaml-regions, --regions and --targets already set. Please don’t let this confuse you when you see all the camels!

Usage

 Usage: sccmec [OPTIONS]

 sccmec - typing SCCmec cassettes in assemblies

╭─ Required Options ──────────────────────────────────────────────────────────────────────────────╮
│ *  --input         -i   TEXT  Input file in FASTA format to classify [required]                 │
│ *  --yaml-targets  -yt  TEXT  YAML file documenting the targets and types [required]            │
│ *  --yaml-regions  -yr  TEXT  YAML file documenting the regions and types [required]            │
│ *  --targets       -t   TEXT  Query targets in FASTA format [required]                          │
│ *  --regions       -r   TEXT  Query regions in FASTA format [required]                          │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Filtering Options ─────────────────────────────────────────────────────────────────────────────╮
│ --min-targets-pident      INTEGER  Minimum percent identity of targets to count a hit           │
│                                    [default: 90]                                                │
│ --min-targets-coverage    INTEGER  Minimum percent coverage of targets to count a hit           │
│                                    [default: 80]                                                │
│ --min-regions-pident      INTEGER  Minimum percent identity of regions to count a hit           │
│                                    [default: 85]                                                │
│ --min-regions-coverage    INTEGER  Minimum percent coverage of regions to count a hit           │
│                                    [default: 83]                                                │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ────────────────────────────────────────────────────────────────────────────╮
│ --prefix   -p  TEXT  Prefix to use for output files [default: sccmec]                           │
│ --outdir   -o  PATH  Directory to write output [default: ./]                                    │
│ --force              Overwrite existing reports                                                 │
│ --verbose            Increase the verbosity of output                                           │
│ --silent             Only critical errors will be printed                                       │
│ --version            Print schema and camlhmp version                                           │
│ --help               Show this message and exit.                                                │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯

As mentioned above, sccmec utilizes the camlhmp API. Except, please note that the --yaml-targets, --yaml-regions, --regions and --targets options are already set to the SCCmec defaults. This means you only need to provide the --input option with your assembly file.

Example Usage

Here’s an example of how to use sccmec using an assembly file (both uncompressed and GZip compressed are supported):

sccmec --input tests/fasta/type-Va-AB121219.fasta.gz --prefix type-v

Running sccmec (via camlhmp) with following parameters:
    --input tests/fasta/type-Va-AB121219.fasta.gz
    --yaml-targets /home/rpetit3/repos/sccmec/data/sccmec-targets.yaml
    --yaml-regions /home/rpetit3/repos/sccmec/data/sccmec-regions.yaml
    --targets /home/rpetit3/repos/sccmec/data/sccmec-targets.fasta
    --regions /home/rpetit3/repos/sccmec/data/sccmec-regions.fasta
    --outdir ./
    --prefix type-v
    --min-targets-pident 90
    --min-targets-coverage 80
    --min-regions-pident 85
    --min-regions-coverage 83
Starting camlhmp for SCCmec Typing (targets)...
Running blastn...
Processing target hits...
Starting camlhmp for SCCmec Typing (regions)...
Running blastn...
Processing region hits...
Final Results...
                                           SCCmec Typing
┏━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┳━━━━┳━━━━━┓
┃ sa… ┃ ty… ┃ su… ┃ me… ┃ ta… ┃ re… ┃ co… ┃ hi… ┃ ta… ┃ t… ┃ re… ┃ r… ┃ ca… ┃ p… ┃ ta… ┃ r… ┃ co… ┃
┡━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━╇━━━━╇━━━━━┩
│ ty… │ V   │ Va  │ +   │ cc… │ Va  │ 10… │ 12  │ sc… │ 1… │ sc… │ 1… │ 1.… │ m… │     │ C… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ b… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ on │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ 12 │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ h… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ w… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ o… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ or │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ m… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ o… │     │
│     │     │     │     │     │     │     │     │     │    │     │    │     │    │     │ h… │     │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┴────┴─────┴────┴─────┴────┴─────┴────┴─────┘
Final predicted type written to ./type-v.tsv
Target-based results against each type written to ./type-v.targets.details.tsv
Target-based blastn results written to ./type-v.targets.blastn.tsv
Region-based results against each type written to ./type-v.regions.details.tsv
Region-based blastn results written to ./type-v.regions.blastn.tsv

If needed, you could adjust the --min-targets-pident, --min-targets-coverage, --min-regions-pident and/or --min-regions-coverage options to be more or less depending on your needs. But please note the defaults are set to the recommended values.

Once the tool has completed, you will find five output files in the current directory which described below.

Output Files

camlhmp-blast will generate three output files:

File Name Description
{PREFIX}.tsv A tab-delimited file with the predicted type
{PREFIX}.targets.blastn.tsv A tab-delimited file of all target-specific blast hits
{PREFIX}.targets.details.tsv A tab-delimited file with details for each type based on targets
{PREFIX}.regions.blastn.tsv A tab-delimited file of all full cassette blast hits
{PREFIX}.regions.details.tsv A tab-delimited file with details for each type based on full cassettes

Example {PREFIX}.tsv

sample    type    subtype    mecA    targets    regions    coverage    hits    target_schema    target_schema_version    region_schema    region_schema_version    camlhmp_version    params    target_comment    region_comment    comment
type-v    V    Va    +    ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1    Va    100.00    12    sccmec_targets    1.2.0    sccmec_regions    1.2.0    1.0.1    min-targets-coverage=80;min-targets-pident=90;min-regions-coverage=83;min-regions-pident=85        Coverage based on 12 hits;There were one or more overlapping hits    
Column Description
sample The sample name as determined by --prefix
type The predicted type (based on targets and full cassettes)
subtype The predicted subtype (based on full cassettes)
mecA The mecA gene status (+=present or -=absent or not a significant hit)
targets The targets for the given type that had a hit
regions The regions for the given type that had a hit
coverage The coverage of the full cassette in the regions column
hits The number of hits that made up the full cassette coverage
target_schema The schema used to determine the type based on targets
target_schema_version The version of the schema used to determine the type based on targets
region_schema The schema used to determine the type based on full cassettes
region_schema_version The version of the schema used to determine the type based on full cassettes
camlhmp_version The version of camlhmp used to determine the type
params The parameters used to determine the type
target_comment A small comment about the target results
region_comment A small comment about the region results
comment A small comment about the final result

Example {PREFIX}.targets.blastn.tsv

qseqid    sseqid    pident    qcovs    qlen    slen    length    nident    mismatch    gapopen    qstart    qend    sstart    send    evalue    bitscore
ccrC1    AB121219.1    100.000    100    1623    28612    1623    1623    0    0    1    1623    16132    17754    0.0    2998
ccrC1    AB121219.1    90.439    100    1677    28612    1684    1523    148    12    1    1677    16132    17809    0.0    2206
IS431_1    AB121219.1    100.000    100    791    28612    791    791    0    0    1    791    8221    9011    0.0    1461
IS431_1    AB121219.1    98.085    100    791    28612    731    717    14    0    1    731    3423    2693    0.0    1273
IS431_1    AB121219.1    99.704    100    675    28612    675    673    2    0    1    675    2693    3367    0.0    1236
...

This is the standard BLAST output with -outfmt 6

Example {PREFIX}.targets.details.tsv

sample    type    status    targets    missing    schema    schema_version    camlhmp_version    params    comment
type-v    I    False    IS431,mecA,mecR1    ccrA1,ccrB1,IS1272    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    II    False    IS431,mecA,mecR1    ccrA2,ccrB2,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    III    False    IS431,mecA,mecR1    ccrA3,ccrB3,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    IV    False    IS431,mecA,mecR1    ccrA2,ccrB2,IS1272    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    V    True    ccrC1,IS431_1,mecA,mecR1,IS431_2        sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    VI    False    IS431,mecA,mecR1    ccrA4,ccrB4,IS1272    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    VII    False    ccrC1,IS431_1,mecA,mecR1,IS431_2    IS12960D    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    VIII    False    IS431,mecA,mecR1    ccrA4,ccrB4,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    Excluded target ccrC1 found, failing type VIII
type-v    IX    False    IS431_1,mecA,mecR1,IS431_2    ccrA1,ccrB1    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    X    False    IS431_1,mecA,mecR1,IS431_2    ccrA1,ccrB6    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    XI    False    mecA,mecR1    ccrA1,ccrB3,blaZ,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    XII    False    IS431_1,mecA,mecR1,IS431_2    ccrC2    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    XIII    False    IS431,mecA,mecR1    ccrC2,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    XIV    False    ccrC1,IS431,mecA,mecR1    mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    
type-v    XV    False    IS431,mecA,mecR1    ccrA1,ccrB6,mecI    sccmec_targets    1.2.0    1.0.1    min-coverage=90;min-pident=80    

This file provides a detailed view of the results. The columns are:

Column Description
sample The sample name as determined by --prefix
type The type being tested
status The status of the type (True if failed)
targets The targets for the given type that had a match
missing The targets for the given type that were not found
schema The schema used to determine the type
schema_version The version of the schema used to determine the type
camlhmp_version The version of camlhmp used to determine the type
params The parameters used to determine the type
comment A small comment about the result

Example {PREFIX}.regions.blastn.tsv

qseqid    sseqid    pident    qcovs    qlen    slen    length    nident    mismatch    gapopen    qstart    qend    sstart    send    evalue    bitscore
III    AB121219.1    99.371    25    68256    28612    4132    4106    26    0    24230    28361    8220    4089    0.0    7487
III    AB121219.1    86.738    25    68256    28612    5067    4395    628    42    59204    64248    17954    12910    0.0    5594
III    AB121219.1    94.259    25    68256    28612    3240    3054    172    11    44582    47815    22419    19188    0.0    4940
III    AB121219.1    98.421    25    68256    28612    1837    1808    25    4    27952    29787    4458    2625    0.0    3229
III    AB121219.1    99.494    25    68256    28612    791    787    3    1    34225    35015    3423    2634    0.0    1437
...

This is the standard BLAST output with -outfmt 6

Example {PREFIX}.regions.details.tsv

sample    type    status    targets    missing    coverage    hits    schema    schema_version    camlhmp_version    params    comment
type-v    Ia    False        Ia    17.67    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    Ib    False        Ib    16.61    2    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 2 hits
type-v    IIa    False        IIa    11.85    11    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 11 hits;There were one or more overlapping hits
type-v    IIb    False        IIb    0.00    0    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    
type-v    IIc    False        IIc    17.39    4    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 4 hits;There were one or more overlapping hits
type-v    IId    False        IId    0.00    0    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    
type-v    IIe    False        IIe    1.54    1    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    
type-v    III    False        III    24.50    18    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 18 hits;There were one or more overlapping hits
type-v    IVa    False        IVa    29.35    13    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 13 hits;There were one or more overlapping hits
type-v    IVb    False        IVb    33.19    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    IVc    False        IVc    23.56    14    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 14 hits;There were one or more overlapping hits
type-v    IVd    False        IVd    7.78    1    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    
type-v    IVg    False        IVg    30.66    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    IVi    False        IVi    30.85    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    IVj    False        IVj    30.58    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    IVk    False        IVk    16.00    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    IVl    False        IVl    19.79    13    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 13 hits;There were one or more overlapping hits
type-v    IVm    False        IVm    25.73    14    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 14 hits;There were one or more overlapping hits
type-v    IVn    False        IVn    28.15    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    Va    True    Va        100.00    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    Vb    False        Vb    64.55    17    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 17 hits;There were one or more overlapping hits
type-v    Vc    False        Vc    50.14    17    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 17 hits;There were one or more overlapping hits
type-v    VI    False        VI    29.79    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    VII    False        VII    45.86    15    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 15 hits;There were one or more overlapping hits
type-v    VIII    False        VIII    16.95    9    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 9 hits;There were one or more overlapping hits
type-v    IX    False        IX    15.33    11    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 11 hits;There were one or more overlapping hits
type-v    X    False        X    13.68    16    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 16 hits;There were one or more overlapping hits
type-v    XI    False        XI    0.00    0    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    
type-v    XII    False        XII    19.37    15    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 15 hits;There were one or more overlapping hits
type-v    XIII    False        XIII    28.39    12    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 12 hits;There were one or more overlapping hits
type-v    XIV    False        XIV    14.50    16    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 16 hits;There were one or more overlapping hits
type-v    XV    False        XV    17.21    11    sccmec_regions    1.2.0    1.0.1    min-coverage=85;min-pident=83    Coverage based on 11 hits;There were one or more overlapping hits

This file provides a detailed view of the results. The columns are:

Column Description
sample The sample name as determined by --prefix
type The type being tested
status The status of the type (True if failed)
targets The targets for the given type that had a match
missing The targets for the given type that were not found
coverage The coverage of the full cassette
hits The number of hits that made up the full cassette coverage
schema The schema used to determine the type
schema_version The version of the schema used to determine the type
camlhmp_version The version of camlhmp used to determine the type
params The parameters used to determine the type
comment A small comment about the result

Citations

If you use sccmec in your research, please cite the following:

Naming

I considered thinking of a fun name for this tool, but sometimes it’s best to get straight to the point! So, here we are with sccmec.

License

I’m not a lawyer and MIT has always been my go-to license. So, MIT it is!

Curators

关于

用于从细菌基因组数据中鉴定和分型葡萄球菌染色体mec盒(SCCmec)的工具

6.8 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号