目录

ToolDistillator: a tool to extract and aggregate information from different tool outputs to JSON parsable files

License: GPL v3 Python Version GitHub release

ToolDistillator is a tool to extract information from output files of specific tools, expose it as JSON files, and aggregate over several tools.

It can produce both a single file to each tool or a summarized file from a set of reports.

It was initially developped to be used on Galaxy and some options are only available on Galaxy (e.g. extract the historic ID from a galaxy analysis).

Tool was inspirated from the hAMRonization project (author: @dfornika, @fmaguire, @raphenya, @jodyphelan, @pvanheus)

Content

Installation

Requirement

  • Python > 3.8
  • pandas
  • biopython

Conda installation

$ conda --name tooldistillator --channel bioconda tooldistillator

Installation from sources

  1. Clone the GitLab repository

    $ git clone https://gitlab.com/ifb-elixirfr/abromics/tooldistillator.git
  2. Move inside the created folder

    $ cd tooldistillator
  3. Install dependencies

    $ pip install .

Usage

Command list

$ tooldistillator --help
usage: tooldistillator <tool> <options>

Extract information from tool output(s) to JSON and/oraggregate several JSON reports

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --logfile LOGFILE     Log file location

Tools:
  {abricate,amrfinderplus,argnorm,bakta,bandage,bracken,bwa,checkm2,coreprofiler,concoct,coverm,dastool,deeparg,drep,eggnogmapper,fastp,fastqc,filtlong,flye,groot,gtdbtk,integronfinder2,isescan,kraken2,maxbin2,megahit,metabat2,mmseqs2linclust,mmseqs2taxonomy,multiqc,plasmidfinder,polypolish,prodigal,quast,recentrifuge,refseqmasher,shovill,staramr,sylph,sylphtax,tabular_file,tooltest,summarize}
    abricate            Extract information from abricate's output report i.e., OUTPUT.tsv
    amrfinderplus       Extract information from amrfinderplus's output report i.e., report.tsv
    argnorm           Extract information from argnorm's output report i.e., output.tsv
    bakta               Extract information from bakta's output report i.e., OUTPUT.json
    bandage             Extract information from bandage's output report i.e., OUTPUT.txt
    bracken             Extract information from bracken's output report i.e., report.tsv
    bwa                 Extract information from bwa's output report i.e., input.bam
    checkm2             Extract information from checkm2's output report i.e., quality_report.tsv
    concoct             Extract information from concoct's output report i.e., merge_cut_clusters.csv
    coreprofiler        Extract information from coreprofiler's output report i.e., results.tsv
    coverm              Extract information from coverm coverage output report i.e., coverage_report.tsv
    dastool              Extract information from dastool's summary report i.e., summary_bins.tabular
    deeparg        Extract information from deeparg's output report i.e., report.txt
    drep        Extract information from drep's output report i.e., quality_and_cluster_summary.csv (Widb file)
    eggnogmapper    Extract information from eggnogmapper's output report i.e., annotations_report.tsv
    fastp               Extract information from fastp's output report i.e., report.json
    fastqc              Extract information from fastqc's output report i.e., report.txt
    filtlong            Extract information from filtlong's output report i.e., input.fastq
    flye                Extract information from flye's output report i.e., contig.fasta
    groot        Extract information from groot's output report i.e., report.tsv
    gtdbtk        Extract information from gtdbtk's output report i.e., taxonomy_summary.tsv
    integronfinder2     Extract information from integronfinder2's output report i.e., OUTPUT.integrons, OUTPUT.summary
    isescan             Extract information from isescan's output report i.e., OUTPUT.tsv
    kraken2             Extract information from kraken2's output report i.e., report.tsv
    maxbin2             Extract information from maxbin2's output report i.e., bin_summary.tsv
    megahit             Extract information from megahit's output assembly i.e., assembly.fasta
    metabat2             Extract information from metabat2's output report i.e., cluster_membership.tsv
    mmseqs2linclust     Extract information from mmseqs2 linclust module output clustering i.e., cluster.fasta
    mmseqs2taxonomy     Extract information from mmseqs2 taxonomy module output i.e., tax_output.tsv
    multiqc             Extract information from multiqc's output report i.e., output.html
    plasmidfinder       Extract information from plasmidfinder's output report i.e., plasmidfinder.tsv
    polypolish          Extract information from polypolish's output report i.e., contig.fasta
    prodigal            Extract information from prodigal's output fasta i.e., cds.fasta
    quast               Extract information from quast's output report i.e., report.tsv
    recentrifuge        Extract information from recentrifuge's output report i.e., data.tsv
    refseqmasher        Extract information from refseqmasher's output report i.e., OUTPUT.tsv
    shovill             Extract information from shovill's output report i.e., contig.fasta
    staramr             Extract information from staramr's output report i.e., resfinder.tsv
    sylph                 Extract information from sylph's output report i.e., report.tsv
    sylphtax            Extract information from sylph-tax's output report i.e., merge_report.tsv
    tabular_file        Extract information from tabular_file's output report i.e., report.tsv
    tooltest            Extract information from tooltest's output report i.e., unitest
    summarize           Aggregate several JSON reports generated from a tool output

Tool specification

For each tool, the requirements can be accessed using the --help argument, e.g.

  $ tooldistillator abricate --help
  usage: tooldistillator.py abricate <options>

  Extract information from output(s) of abricate (OUTPUT.tsv)

  positional arguments:
    report                Path to report(s)

  options:
    -h, --help            show this help message and exit
    -o OUTPUT, --output OUTPUT
                          Output location
    --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                          abricate version for abricate
    --reference_database_version REFERENCE_DATABASE_VERSION
                          DB version for abricate
    --hid HID             historic ID for abricate file from galaxy for abricate

You can also test the command using the test data available in test/data/dummy folders. For example:

  $ tooldistillator abricate test/data/dummy/abricate/report.tsv --reference_database_version 1.1.1 --analysis_software_version 1.0.0

Parse multiple same inputs

It is possible to parse multiple reports from the same tool at once by giving a list of reports as the argument, e.g.:

  $ abromics abricate test/data/dummy/abricate/*.tsv --reference_database_version 3.2.5 --analysis_software_version 1.0.0

This will generate only one JSON file for all reports.

When you can provide different kind of files to a tool (e.g. shovill option use the contig.fasta, but can also use the alignment bam and assembly graph file), you can not submit in multiple file mode!

Aggregate JSON reports from different tools

To aggregate JSON reports from different tools in one final JSON file, you can use the summarize subcommand:

  $ tooldistillator summarize --help
  usage: tooldistillator summarize <options> <list of reports>

  Aggregate several reports

  positional arguments:
    tooldistillator_reports
                          list of tooldistillator reports

  options:
    -h, --help            show this help message and exit
    -o OUTPUT, --output OUTPUT
                          Output file path for summary
    -f, --force           Overwrite to the output file mandatory

You can try it using test data in test/data/dummy/summarize folder:

  $ abromics summarize test/data/dummy/summarize/*output.json -o test/data/raw_output/summarize/tooldistillator_summary.json

Available tools

A diverse set of tools is available, along with a generic one for tabular files with headers. There is also a command to aggregate JSON outputs.

Tools Version Default input file Optional files
Abricate 1.0.1 output.tsv
AMRFinderPlus 3.11.26 report.tsv point_mutation_report.tsv, nucleotide_sequence.fasta
argNorm 1.0.0 output.tsv
Bakta 1.7.0 output.json protein.faa, nucleotide.fna, annotation.gff3, summary.txt
Bandage 0.8.1 info.txt
Bracken 2.8 output.tsv taxonomy.tsv
BWA 2.2.1 input1.bam input2.bam
Checkm2 1.0.2 quality_report.tsv DIAMOND_RESULTS.tsv, protein_files.zip, checkm2.log
Concoct 1.1.0 merge_cut_clusters.csv bin_folder.zip, cut_up_contigs.fasta, coordinates_cut_up.bed,
coverage_table.tabular, log_file.txt
CoreProfiler 1.1.1 calling_results.tsv profiles_w_tmp_alleles.json, new_alleles.fasta
CoverM 0.7.0 coverage_report.tsv drep_output_cluster_definition.tsv, drep_directory.zip
DasTool 1.1.7 summary_bins.tabular bins_folder.zip, contigs_to_bin.tabular, quality_and_completeness.tabular,
proteins.fasta, unbinned.fasta, log_file.txt
DeepARG 1.0.4 report.txt report_ARG_merged.txt, report_ARG_merged_quant_subtype.txt,
report_ARG_merged_quant_type.txt, report_potential_ARG.txt, 
sequence_clean_file.txt, bam_clean_file.bam, sam_clean_file.sam, 
bam_clean_sorted_file.bam, daa_clean_align_file.daa
dRep 3.4.2 quality_and_cluster_summary.csv (Widb.csv) Bdb.csv, Cdb.csv, Chdb.csv, Mdb.csv, Ndb.csv, Sdb.csv, Wdb.csv,
winning_genomes.pdf, cluster_scoring.pdf, clustering_scatterplots.pdf,
primary_clustering_dendrogram.pdf, secondary_clustering_dendrogram.pdf, 
secondary_clustering_MDS.pdf, log_file.txt
EggNOG-mapper 2.1.12 annotation_report.tsv seed_orthologs.tsv, orthologs.tsv
Fastp 0.23.2 output.json
Fastqc 0.12.1 report.txt report.html
Filtlong 0.2.1 input.fastq
Flye 2.9.1 contig.fasta contig.gfa, infos.tsv
Groot 1.1.2 report.tsv groot.log, groot.bam
GTDB-tk 2.4.1 taxonomy_summary.tsv classify.zip, align.zip, identify.zip, log_file.txt
Integronfinder2 2.0.2 output.integrons output.summary
ISEScan 1.7.2.3 output.tsv is.fna, orf.faa, orf.fna
Kraken2 2.1.2 taxonomy.tsv reads_assignation.txt
MaxBin2 2.2.7 bin_summary.tsv bin_folder.zip, bin_predicted_markers.zip, too_short_sequences.fasta,
unclassified_sequences.fasta, marker_gene_presence.tabular,
marker_gene_presence_plot.pdf, log_file.txt
Megahit 1.2.9 assembly.fasta intermediate_contigs.zip, log.txt
Metabat2 2.17 cluster_membership.tsv bins_folder.zip, too_short.fa, unbinned.fa, low_depth.fa
MMseqs2 linclust 17-b804f rep_seqs.fasta all_seqs.fasta, cluster.tsv
MMseqs2 taxonomy 17-b804f tax_output.tsv kraken_report.txt, krona_report.txt
MultiQC 1.11 report.html
Plasmidfinder 2.1.6 output.json genome_hits.fasta, plasmid_hits.fasta
Polypolish 0.5.0 contig.fasta
Prodigal 2.6.3 output.fnn output.faa, output.start, output.gbk, output.gff, output.sco
Quast 5.2.0 output.tsv
Recentrifuge 1.10.0 data.tsv report.html, stat.tsv
Refseqmasher 0.1.2 output.tsv
Shovill 1.1.0 contigs.fasta alignment.bam, contigs.gfa
StarAMR 0.9.1 resfinder.tsv mlst.tsv, pointfinder.tsv, plasmidfinder.tsv, settings.tsv
Sylph 0.8.1 report.tsv
Sylph-tax 1.2.0 merge_report.tsv taxprof_folder.zip
tabular_file 0 output.tsv

Abricate

$ tooldistillator abricate --help
usage: tooldistillator.py abricate <options>

Extract information from output(s) of abricate (OUTPUT.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        abricate version for abricate
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for abricate
  --hid HID             historic ID for abricate file from galaxy for abricate

AMRFinderPlus

$ tooldistillator amrfinderplus --help
usage: tooldistillator.py amrfinderplus <options>

Extract information from output(s) of amrfinderplus (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        amrfinderplus version number for amrfinderplus
  --hid HID             historic ID for amrfinderplus file from galaxy for amrfinderplus
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for amrfinderplus
  --point_mutation_report_path POINT_MUTATION_REPORT_PATH
                        point mutation report file for amrfinderplus
  --point_mutation_report_hid POINT_MUTATION_REPORT_HID
                        historic ID for point mutation report file from galaxy for amrfinderplus
  --nucleotide_sequence_path NUCLEOTIDE_SEQUENCE_PATH
                        nucleotide identified sequence fasta file for amrfinderplus
  --nucleotide_sequence_hid NUCLEOTIDE_SEQUENCE_HID
                        historic ID for nucleotide sequence fasta file from galaxy for amrfinderplus

argNorm

$ tooldistillator argnorm --help
usage: tooldistillator.py argnorm <options>

Extract information from output(s) of argnorm (output.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for argnorm
  --hid HID             historic ID for argnorm file from galaxy  

Bakta

$ tooldistillator bakta --help   
usage: tooldistillator.py bakta <options>

Extract information from output(s) of bakta (OUTPUT.json)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             historic ID to bakta result file from galaxy for bakta
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        bakta version for bakta
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for bakta
  --annotation_tabular_path ANNOTATION_TABULAR_PATH
                        annotation file in TSV format for bakta
  --annotation_tabular_hid ANNOTATION_TABULAR_HID
                        historic ID for annotation tsv file from galaxy for bakta
  --gff_file_path GFF_FILE_PATH
                        annotation file result in gff3 format for bakta
  --gff_file_hid GFF_FILE_HID
                        historic ID for gff file from galaxy for bakta
  --annotation_genbank_path ANNOTATION_GENBANK_PATH
                        annotation file in genbank format for bakta
  --annotation_genbank_hid ANNOTATION_GENBANK_HID
                        historic ID for annotation genbank file from galaxy for bakta
  --annotation_embl_path ANNOTATION_EMBL_PATH
                        annotation file in embl format for bakta
  --annotation_embl_hid ANNOTATION_EMBL_HID
                        historic ID for annotation embl file from galaxy for bakta
  --contig_sequences_path CONTIG_SEQUENCES_PATH
                        contig sequences in fasta ([output].fna) for bakta
  --contig_sequences_hid CONTIG_SEQUENCES_HID
                        historic ID for contigs fasta file from galaxy for bakta
  --nucleotide_annotation_path NUCLEOTIDE_ANNOTATION_PATH
                        nuleotide file ([output].ffn) of the annotation for bakta
  --nucleotide_annotation_hid NUCLEOTIDE_ANNOTATION_HID
                        historic ID for nucleotide file from galaxy for bakta
  --amino_acid_annotation_path AMINO_ACID_ANNOTATION_PATH
                        amino acid file of the annotation for bakta
  --amino_acid_annotation_hid AMINO_ACID_ANNOTATION_HID
                        historic ID for amino acide sequence file from galaxy for bakta
  --hypothetical_protein_path HYPOTHETICAL_PROTEIN_PATH
                        hypothetical protein CDS amino acid sequences as fasta for bakta
  --hypothetical_protein_hid HYPOTHETICAL_PROTEIN_HID
                        historic ID for hypothetical protein file file from galaxy for bakta
  --hypothetical_tabular_path HYPOTHETICAL_TABULAR_PATH
                        hypothetical protein CDS for bakta
  --hypothetical_tabular_hid HYPOTHETICAL_TABULAR_HID
                        historic ID for hypothetical tabular file from galaxy for bakta
  --summary_result_path SUMMARY_RESULT_PATH
                        summary file of the bakta analysis in txt format for bakta
  --summary_result_hid SUMMARY_RESULT_HID
                        historic ID for summary file from galaxy for bakta
  --plot_file_path PLOT_FILE_PATH
                        genome annotation plot file path for bakta
  --plot_file_hid PLOT_FILE_HID
                        historic ID for plot file from galaxy for bakta

Bakta tool generate a complete JSON file in output, which is so big and redondant to a database integration. The ToolDistillator Bakta:

  • Uses some informations from the Bakta JSON file
  • Adds summary of the analysis if provided in summary file from Bakta (optional)
  • Remove informations related to nucleotide and amino acid sequences and add path of the sequences files if provided

Bandage

$ tooldistillator bandage --help
usage: tooldistillator.py bandage <options>

Extract information from output(s) of bandage (OUTPUT.txt)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        bandage version number for bandage
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for bandage
  --hid HID             historic ID for bandage file from galaxy for bandage
  --bandage_plot_path BANDAGE_PLOT_PATH
                        bandage plot file for bandage
  --bandage_plot_hid BANDAGE_PLOT_HID
                        historic ID for bandage plot from galaxy for bandage

Bracken

$ tooldistillator bracken --help
usage: tooldistillator.py bracken <options>

Extract information from output(s) of bracken (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to kraken report file from Galaxy for bracken
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        bracken version for bracken
  --reference_database_version REFERENCE_DATABASE_VERSION
                        bracken DB version for bracken
  --read_len READ_LEN   read length value for bracken
  --level LEVEL         level to estimate abundance for bracken
  --threshold THRESHOLD
                        number of reads required PRIOR to abundance estimation for bracken
  --kraken_report_path KRAKEN_REPORT_PATH
                        New kraken report estimated from bracken for bracken
  --kraken_report_hid KRAKEN_REPORT_HID
                        Historic ID to kraken results file from Galaxy for bracken

BWA

$ tooldistillator bwa --help
usage: tooldistillator.py bwa <options>

Extract information from output(s) of bwa (input.bam)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to bwa contigs file from Galaxy for bwa
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        bwa version number for bwa
  --reference_database_version REFERENCE_DATABASE_VERSION
                        bwa reference genome for bwa
  --paired_second_file_path PAIRED_SECOND_FILE_PATH
                        if paired inputs are uses for bwa
  --paired_second_file_hid PAIRED_SECOND_FILE_HID
                        Galaxy HID to the paired file for bwa

CheckM2

$ tooldistillator checkm2 --help
usage: tooldistillator.py checkm2 <options>

Extract information from output(s) of checkm2 (quality_report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        checkm2 version number for checkm2
  --hid HID             historic ID for checkm2 file from galaxy for checkm2
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for checkm2
  --diamond_results_path DIAMOND_RESULTS_PATH
                        DIAMOND_RESULTS file for checkm2
  --diamond_results_hid DIAMOND_RESULTS_HID
                        historic ID for DIAMOND results file from galaxy for checkm2
  --protein_zip_path PROTEIN_ZIP_PATH
                        protein sequence fasta files in a ZIP file for checkm2
  --protein_zip_hid PROTEIN_ZIP_HID
                        historic ID for protein sequence fasta files in a ZIP file from galaxy for checkm2
  --checkm2_log_path CHECKM2_LOG_PATH
                        Checkm2 log file for checkm2
  --checkm2_log_hid CHECKM2_LOG_HID
                        historic ID for Checkm2 log file from galaxy for checkm2

Concoct

$ tooldistillator concoct --help
usage: tooldistillator.py concoct <options>

Extract information from output(s) of concoct (merge_cut_clusters.csv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        concoct version number for checkm2
  --hid HID             historic ID for concoct file from galaxy
  --fasta_bin_zip_folder_path FASTA_BIN_ZIP_FOLDER_PATH
                        Fasta bin folder for concoct
  --fasta_bin_zip_folder_hid FASTA_BIN_ZIP_FOLDER_HID
                        historic ID for fasta bin folder from galaxy for concoct
  --contigs_cut_up_file_path CONTIGS_CUT_UP_FILE_PATH
                        Contigs cut up fasta file for Concoct
  --contigs_cut_up_file_hid CONTIGS_CUT_UP_FILE_HID
                        historic ID for contigs cut up fasta file from galaxy for concoct
  --coordinates_cut_up_file_path COORDINATES_CUT_UP_FILE_PATH
                        Coordinates contigs cut up bed file for concoct
  --coordinates_cut_up_file_hid COORDINATES_CUT_UP_FILE_HID
                        historic ID for coordinates contigs cut up bed file from galaxy for concoct
  --coverage_table_file_path COVERAGE_TABLE_FILE_PATH
                        Coverage table file from Concoct
  --coverage_table_file_hid COVERAGE_TABLE_FILE_HID
                        historic ID for coverage table file from galaxy for concoct
  --log_file_path LOG_FILE_PATH
                        Log file from Concoct
  --log_file_hid LOG_FILE_HID
                        historic ID for log file from galaxy for concoct

CoreProfiler

$ tooldistillator coreprofiler --help
usage: tooldistillator.py coreprofiler <options>

Extract information from output(s) of coreprofiler (results.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID for coreprofiler file from galaxy for coreprofiler
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        coreprofiler version for coreprofiler
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for coreprofiler
  --profiles_json_path PROFILES_JSON_PATH
                        JSON file containing info about files with temporary alleles for coreprofiler
  --profiles_json_hid PROFILES_JSON_HID
                        Historic ID for profiles JSON file from galaxy for coreprofiler
  --alleles_fna_path ALLELES_FNA_PATH
                        FASTA file with new alleles sequences if detected for coreprofiler
  --alleles_fna_hid ALLELES_FNA_HID
                        Historic ID for new alleles FASTA file from galaxy for coreprofiler

CoverM

$ tooldistillator coverm --help
usage: tooldistillator.py coverm <options>

Extract information from output(s) of coverm (coverage_report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version number for coverm
  --hid HID             historic ID for coverm file from galaxy for coverm
  --dereplication_cluster_definition_path DEREPLICATION_CLUSTER_DEFINITION_PATH
                        dereplicated representative and member lines file from coverM
  --dereplication_cluster_definition_hid DEREPLICATION_CLUSTER_DEFINITION_HID
                        Historic ID to dereplicated representative and member lines file from Galaxy
  --dereplication_representative_fasta_zip_path DEREPLICATION_REPRESENTATIVE_FASTA_ZIP_PATH
                        representative genome files in a ZIP file from coverM
  --dereplication_representative_fasta_zip_hid DEREPLICATION_REPRESENTATIVE_FASTA_ZIP_HID
                        Historic ID to representative genome files from Galaxy

DasTool

$ tooldistillator dastool --help
usage: tooldistillator.py dastool <options>

Extract information from output(s) of dastool (summary_bins.tabular)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version number for dastool
  --hid HID             historic ID for dastool bins summary file from galaxy for dastool
  --fasta_bin_zip_folder_path FASTA_BIN_ZIP_FOLDER_PATH
                        Bin fasta format ZIP folder for DasTool
--fasta_bin_zip_folder_hid FASTA_BIN_ZIP_FOLDER_HID  
            Historic ID to Bin fasta format ZIP folder from Galaxy
  --contig_to_bin_file_path CONTIG_TO_BIN_FILE_PATH
                        Contig to bin file from DasTool
  --contig_to_bin_file_hid CONTIG_TO_BIN_FILE_HID
                        Historic ID to contig to bin file from Galaxy
  --quality_and_completness_file_path QUALITY_AND_COMPLETNESS_FILE_PATH
                        Quality and completness file from DasTool
  --quality_and_completness_file_hid QUALITY_AND_COMPLETNESS_FILE_HID
                        Historic ID to quality and completness file from Galaxy
  --protein_sequences_file_path PROTEIN_SEQUENCES_FILE_PATH
                        Proteins sequences fasta file from DasTool
  --protein_sequences_file_hid PROTEIN_SEQUENCES_FILE_HID
                        Historic ID to proteins sequences fasta file file from Galaxy
  --unbinned_sequences_file_path UNBINNED_SEQUENCES_FILE_PATH
                        Unbinned sequences fasta file from DasTool
  --unbinned_sequences_file_hid UNBINNED_SEQUENCES_FILE_HID
                        Historic ID to unbinned sequences fasta file file from Galaxy
  --log_file_path LOG_FILE_PATH
                        Log file from DasTool
  --log_file_hid LOG_FILE_HID
                        Historic ID to log file file from Galaxy

DeepARG

$ tooldistillator deeparg --help
usage: tooldistillator.py deeparg <options>

Extract information from output(s) of deeparg (ARG prediction and quantification)

positional arguments:
  report                Path to report(s)

options:
  -h, --help                                Show this help message and exit
  -o OUTPUT, --output OUTPUT                Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                                            DeepARG version
  --reference_database_version REFERENCE_DATABASE_VERSION
                                            Database version used for DeepARG
  --hid HID                                 Historic ID for DeepARG file from Galaxy
  --report_ARG_merged_path REPORT_ARG_MERGED_PATH
                                            DeepARG ARG merged report file
  --report_ARG_merged_hid REPORT_ARG_MERGED_HID
                                            Historic ID to ARG merged report file from Galaxy
  --report_ARG_merged_quant_subtype_path REPORT_ARG_MERGED_QUANT_SUBTYPE_PATH
                                            DeepARG ARG merged quantitative subtype report file
  --report_ARG_merged_quant_subtype_hid REPORT_ARG_MERGED_QUANT_SUBTYPE_HID
                                            Historic ID to ARG merged quantitative subtype report file from Galaxy
  --report_ARG_merged_quant_type_path REPORT_ARG_MERGED_QUANT_TYPE_PATH
                                            DeepARG ARG merged quant type report file
  --report_ARG_merged_quant_type_hid REPORT_ARG_MERGED_QUANT_TYPE_HID
                                            Historic ID to ARG merged quantitative type report file from Galaxy
  --report_potential_ARG_path REPORT_POTENTIAL_ARG_PATH
                                            DeepARG potential ARG report file
  --report_potential_ARG_hid REPORT_POTENTIAL_ARG_HID
                                            Historic ID to potential ARG report file from Galaxy
  --sequence_clean_file_path SEQUENCE_CLEAN_FILE_PATH
                                            Sequence clean file from DeepARG
  --sequence_clean_file_hid SEQUENCE_CLEAN_FILE_HID
                                            Historic ID to sequence clean file from Galaxy
  --bam_clean_file_path BAM_CLEAN_FILE_PATH
                                            Binary Alignment clean file from DeepARG
  --bam_clean_file_hid BAM_CLEAN_FILE_HID
                                            Historic ID to clean alignment file from Galaxy
  --sam_clean_file_path SAM_CLEAN_FILE_PATH
                                            Sequence Alignment clean file from DeepARG
  --sam_clean_file_hid SAM_CLEAN_FILE_HID
                                            Historic ID to clean alignment file from Galaxy
  --bam_clean_sorted_file_path BAM_CLEAN_SORTED_FILE_PATH
                                            Binary Alignment clean sorted file from DeepARG
  --bam_clean_sorted_file_hid BAM_CLEAN_SORTED_FILE_HID
                                            Historic ID to clean sorted alignment file from Galaxy
  --daa_clean_align_file_path DAA_CLEAN_ALIGN_FILE_PATH
                                            DAA clean align file from DeepARG
  --daa_clean_align_file_hid DAA_CLEAN_ALIGN_FILE_HID
                                            Historic ID to DAA sorted alignment file from Galaxy

dRep

$ tooldistillator drep --help
usage: tooldistillator.py drep <options>

Extract information from output(s) of dRep (dereplication results)

positional arguments:
  report                Path to report(s)

options:
  -h, --help                                Show this help message and exit
  -o OUTPUT, --output OUTPUT                Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                                            dRep version number
  --hid HID                                 Historic ID to dRep clusters file from Galaxy
  --fasta_dereplicated_bin_zip_folder_path FASTA_DEREPLICATED_BIN_ZIP_FOLDER_PATH
                                            Dereplicated bin fasta format ZIP folder for dRep
  --fasta_dereplicated_bin_zip_folder_hid FASTA_DEREPLICATED_BIN_ZIP_FOLDER_HID
                                            Historic ID to dereplicated bin fasta format ZIP folder from Galaxy
  --bdb_file_path BDB_FILE_PATH
                                            Bdb file for dRep
  --bdb_file_hid BDB_FILE_HID
                                            Historic ID to Bdb file from Galaxy
  --cdb_file_path CDB_FILE_PATH
                                            Cdb file for dRep
  --cdb_file_hid CDB_FILE_HID
                                            Historic ID to Cdb file from Galaxy
  --chdb_file_path CHDB_FILE_PATH
                                            Chdb file for dRep
  --chdb_file_hid CHDB_FILE_HID
                                            Historic ID to Chdb file from Galaxy
  --mdb_file_path MDB_FILE_PATH
                                            Mdb file for dRep
  --mdb_file_hid MDB_FILE_HID
                                            Historic ID to Mdb file from Galaxy
  --ndb_file_path NDB_FILE_PATH
                                            Ndb file for dRep
  --ndb_file_hid NDB_FILE_HID
                                            Historic ID to Ndb file from Galaxy
  --sdb_file_path SDB_FILE_PATH
                                            Sdb file for dRep
  --sdb_file_hid SDB_FILE_HID
                                            Historic ID to Sdb file from Galaxy
  --wdb_file_path WDB_FILE_PATH
                                            Wdb file for dRep
  --wdb_file_hid WDB_FILE_HID
                                            Historic ID to Wdb file from Galaxy
  --winning_genomes_pdf_path WINNING_GENOMES_PDF_PATH
                                            Winning genomes pdf file for dRep
  --winning_genomes_pdf_hid WINNING_GENOMES_PDF_HID
                                            Historic ID to winning genomes pdf file from Galaxy
  --cluster_scoring_pdf_path CLUSTER_SCORING_PDF_PATH
                                            Cluster scoring pdf file for dRep
  --cluster_scoring_pdf_hid CLUSTER_SCORING_PDF_HID
                                            Historic ID to cluster scoring pdf file from Galaxy
  --clustering_scatterplots_pdf_path CLUSTERING_SCATTERPLOTS_PDF_PATH
                                            Clustering scatterplots pdf file for dRep
  --clustering_scatterplots_pdf_hid CLUSTERING_SCATTERPLOTS_PDF_HID
                                            Historic ID to clustering scatterplots pdf file from Galaxy
  --primary_clustering_dendrogram_pdf_path PRIMARY_CLUSTERING_DENDROGRAM_PDF_PATH
                                            Primary clustering dendrogram pdf file for dRep
  --primary_clustering_dendrogram_pdf_hid PRIMARY_CLUSTERING_DENDROGRAM_PDF_HID
                                            Historic ID to primary clustering dendrogram pdf file from Galaxy
  --secondary_clustering_dendrogram_pdf_path SECONDARY_CLUSTERING_DENDROGRAM_PDF_PATH
                                            Secondary clustering dendrogram pdf file for dRep
  --secondary_clustering_dendrogram_pdf_hid SECONDARY_CLUSTERING_DENDROGRAM_PDF_HID
                                            Historic ID to secondary clustering dendrogram pdf file from Galaxy
  --secondary_clustering_mds_pdf_path SECONDARY_CLUSTERING_MDS_PDF_PATH
                                            Secondary clustering MDS pdf file for dRep
  --secondary_clustering_mds_pdf_hid SECONDARY_CLUSTERING_MDS_PDF_HID
                                            Historic ID to secondary clustering MDS pdf file from Galaxy
  --log_file_path LOG_FILE_PATH
                                            Log file from dRep
  --log_file_hid LOG_FILE_HID
                                            Historic ID to dRep log file from Galaxy

EggNOG-mapper

$ tooldistillator eggnogmapper --help
usage: tooldistillator.py eggnogmapper <options>

Extract information from output(s) of eggnogmapper annotation results files

positional arguments:
  report                Path to report(s)

options:
  -h, --help                                Show this help message and exit
  -o OUTPUT, --output OUTPUT                Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                                            EggNOG-mapper version
  --reference_database_version REFERENCE_DATABASE_VERSION
                                            Database version used for EggNOG-mapper
  --hid HID                                 Historic ID for EggNOG-mapper file from Galaxy
  --seed_orthologs_report_path SEED_ORTHOLOGS_REPORT_PATH
                                            EggNOG-mapper seed orthologs result file
  --seed_orthologs_report_hid SEED_ORTHOLOGS_REPORT_HID
                                            Historic ID to seed orthologs result file from Galaxy
  --orthologs_report_path ORTHOLOGS_REPORT_PATH
                                            EggNOG-mapper orthologs result file
  --orthologs_report_hid ORTHOLOGS_REPORT_HID
                                            Historic ID to orthologs result file from Galaxy

Fastp

$ tooldistillator fastp --help  
usage: tooldistillator.py fastp <options>

Extract information from output(s) of fastp (report.json)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        fastp version number for fastp
  --hid HID             historic ID for fastp file from galaxy for fastp
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for fastp
  --trimmed_forward_R1_path TRIMMED_FORWARD_R1_PATH
                        forward R1 trimmed file for fastp
  --trimmed_forward_R1_hid TRIMMED_FORWARD_R1_HID
                        historic ID for forward reads file from galaxy for fastp
  --trimmed_reverse_R2_path TRIMMED_REVERSE_R2_PATH
                        reverse R2 trimmed file for fastp
  --trimmed_reverse_R2_hid TRIMMED_REVERSE_R2_HID
                        historic ID for reverse reads file from galaxy for fastp
  --html_report_path HTML_REPORT_PATH
                        html fastp report path for fastp
  --html_report_hid HTML_REPORT_HID
                        historic ID for fastp html report for fastp

Fastqc

$ tooldistillator fastqc --help
usage: tooldistillator.py fastqc <options>

Extract information from output(s) of fastqc (report.txt)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        fastqc version number for fastqc
  --hid HID             historic ID for fastqc file from galaxy for fastqc
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for fastqc
  --html_report_path HTML_REPORT_PATH
                        html fastqc report path for fastqc
  --html_report_hid HTML_REPORT_HID
                        historic ID for fastqc html report for fastqc

Filtlong

$ tooldistillator filtlong --help
usage: tooldistillator.py filtlong <options>

Extract information from output(s) of filtlong (input.fastq)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        filtlong version number for filtlong
  --hid HID             Historic ID to filtlong contigs file from Galaxy for filtlong
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for filtlong

Flye

$ tooldistillator flye --help
usage: tooldistillator.py flye <options>

Extract information from output(s) of flye (contig.fasta)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to flye contigs file from Galaxy for flye
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        flye version number for flye
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for flye
  --contig_graph_path CONTIG_GRAPH_PATH
                        Assembly graph file for flye
  --contig_graph_hid CONTIG_GRAPH_HID
                        Historic ID to assembly graph from Galaxy for flye
  --tsv_file_path TSV_FILE_PATH
                        Assembly info file from flye for flye
  --tsv_file_hid TSV_FILE_HID
                        Historic ID to assembly info file from Galaxy for flye

Groot

$ tooldistillator groot --help
usage: tooldistillator.py groot <options>

Extract information from output(s) of groot (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version number for groot
  --hid HID             historic ID for groot file from galaxy
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for groot
  --groot_log_path GROOT_LOG_PATH
                        log file for groot
  --groot_log_hid GROOT_LOG_HID
                        historic ID for Groot log file from galaxy
  --bam_file_path BAM_FILE_PATH
                binary Alignment file from groot
  --bam_file_hid BAM_FILE_HID
                historic ID for groot alignment file from Galaxy

GTDB-tk

$ tooldistillator gtdbtk --help
usage: tooldistillator.py gtdbtk <options>

Extract information from output(s) of gtdbtk taxonomy files

positional arguments:
  report                Path to report(s)

options:
  -h, --help                                Show this help message and exit
  -o OUTPUT, --output OUTPUT                Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                                            GTDB-tk version
  --reference_database_version REFERENCE_DATABASE_VERSION
                                            Database version used for GTDB-tk
  --hid HID                                 Historic ID for GTDB-tk file from Galaxy
  --classify_directory_path CLASSIFY_DIRECTORY_PATH
                                            GTDB-tk classify ZIP directory
  --classify_directory_hid CLASSIFY_DIRECTORY_HID
                                            Historic ID to classify ZIP directory from Galaxy
  --align_directory_path ALIGN_DIRECTORY_PATH
                                            GTDB-tk align ZIP directory
  --align_directory_hid ALIGN_DIRECTORY_HID
                                            Historic ID to align ZIP directory from Galaxy
  --identify_directory_path IDENTIFY_DIRECTORY_PATH
                                            GTDB-tk identify ZIP directory
  --identify_directory_hid IDENTIFY_DIRECTORY_HID
                                            Historic ID to identify ZIP directory from Galaxy
  --log_file_path LOG_FILE_PATH
                                            GTDB-tk log file
  --log_file_hid LOG_FILE_HID
                                            Historic ID to log file from Galaxy

Integronfinder2

$ tooldistillator integronfinder2 --help
usage: tooldistillator.py integronfinder2 <options>

Extract information from output(s) of integronfinder2 (OUTPUT.integrons, OUTPUT.summary)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             historic ID for integronfinder file from galaxy for integronfinder2
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        integronfinder version for integronfinder2
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for integronfinder2
  --summary_file_path SUMMARY_FILE_PATH
                        integronfinder summary file path for integronfinder2
  --summary_file_hid SUMMARY_FILE_HID
                        historic ID for summary file from galaxy for integronfinder2

ISEScan

$ tooldistillator isescan --help  
usage: tooldistillator.py isescan <options>

Extract information from output(s) of isescan (OUTPUT.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID for isescan file from galaxy for isescan
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        isescan version for isescan
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for isescan
  --orf_fna_path ORF_FNA_PATH
                        fasta file with nucleotide orf sequences for isescan
  --orf_fna_hid ORF_FNA_HID
                        Historic ID for orf fasta file from galaxy for isescan
  --orf_faa_path ORF_FAA_PATH
                        fasta file with amino acide orf sequences for isescan
  --orf_faa_hid ORF_FAA_HID
                        Historic ID for orf amino acid file from galaxy for isescan
  --is_fna_path IS_FNA_PATH
                        fasta file with nucleotide IS sequences for isescan
  --is_fna_hid IS_FNA_HID
                        Historic ID for IS file from galaxy for isescan
  --summary_path SUMMARY_PATH
                        summary of isescan analysis for isescan
  --summary_hid SUMMARY_HID
                        Historic ID for summary file from galaxy for isescan
  --annotation_path ANNOTATION_PATH
                        isescan annotation gff3 file for isescan
  --annotation_hid ANNOTATION_HID
                        Historic ID for gff annotation file from galaxy for isescan

Kraken2

$ tooldistillator kraken2 --help
usage: tooldistillator.py kraken2 <options>

Extract information from output(s) of kraken2 (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             kraken report hid for kraken2
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        kraken2 version for kraken2
  --reference_database_version REFERENCE_DATABASE_VERSION
                        kraken2 DB version for kraken2
  --seq_classification_file_path SEQ_CLASSIFICATION_FILE_PATH
                        file containing the classification of each reads for kraken2
  --seq_classification_file_hid SEQ_CLASSIFICATION_FILE_HID
                        historic ID for read classification file from Galaxy for kraken2

MaxBin2

$ tooldistillator maxbin2 --help
usage: tooldistillator.py maxbin2 <options>

Extract information from output(s) of maxbin2 (bin_summary.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             bin summary file hid for maxbin2
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for maxbin2
  --fasta_bin_zip_folder_path FASTA_BIN_ZIP_FOLDER_PATH
                        folder containing fasta bins from binning for maxbin2
  --fasta_bin_zip_folder_hid FASTA_BIN_ZIP_FOLDER_HID
                        historic ID for folder containing fasta bins from Galaxy for maxbin2
  --bin_predicted_markers_zip_folder_path BIN_PREDICTED_MARKERS_ZIP_FOLDER_PATH
                        folder containing predicted markers files for maxbin2
  --bin_predicted_markers_zip_folder_path BIN_PREDICTED_MARKERS_ZIP_FOLDER_HID
                        historic ID for folder containing predicted markers files from Galaxy for maxbin2
  --too_short_sequences_file_path TOO_SHORT_SEQUENCES_PATH
                        too short sequences fasta file for maxbin2
  --too_short_sequences_file_hid TOO_SHORT_SEQUENCES_HID
                        historic ID for too short sequences fasta file from Galaxy for maxbin2
  --unclassified_sequences_file_path UNCLASSIFIED_SHORT_SEQUENCES_PATH
                        unclassified sequences fasta file for maxbin2
  --unclassified_sequences_file_HID UNCLASSIFIED_SHORT_SEQUENCES_HID
                        historic ID for unclassified sequences fasta file from Galaxy for maxbin2
  --marker_gene_presence_file_path MARKER_GENE_PRESENCE_FILE_PATH
                        gene marker presence file for maxbin2
  --marker_gene_presence_file_hid MARKER_GENE_PRESENCE_FILE_HID
                        historic ID for gene marker presence file from Galaxy for maxbin2
  --marker_gene_presence_plot_file_path MARKER_GENE_PRESENCE_PLOT_FILE_PATH
                        gene marker presence plot file for maxbin2
  --marker_gene_presence_plot_file_hid MARKER_GENE_PRESENCE_PLOT_FILE_HID
                        historic ID for gene marker presence plot file from Galaxy for maxbin2
  --log_file_path LOG_FILE_PATH
                        Log file from maxbin2
  --log_file_hid LOG_FILE_HID
                        historic ID to log file from Galaxy for maxbin2

Megahit

$ tooldistillator megahit --help
usage: tooldistillator.py megahit <options>

Extract information from output(s) of megahit (assembly.fasta)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             assembly fasta file hid for megahit
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for megahit
  --intermediate_contig_folder_path INTERMEDIATE_CONTIG_FOLDER_PATH
                        folder containing intermediate contigs from assembly for megahit
  --intermediate_contig_folder_hid INTERMEDIATE_CONTIG_FOLDER_HID
                        historic ID for folder containing intermediate contigs from assembly from Galaxy for megahit
  --log_file_path LOG_FILE_PATH
                        Log file from megahit
  --log_file_hid LOG_FILE_HID
                        historic ID to log file from Galaxy for megahit

Metabat2

$ tooldistillator metabat2 --help
usage: tooldistillator.py metabat2 <options>

Extract information from output(s) of metabat2 (cluster_membership.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             cluster membership file hid for metabat2
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for metabat2
  --fasta_bin_zip_folder_path FASTA_BIN_ZIP_FOLDER_PATH
                        folder containing fasta bins from binning for metabat2
  --fasta_bin_zip_folder_hid FASTA_BIN_ZIP_FOLDER_HID
                        historic ID for folder containing fasta bins from Galaxy for metabat2
  --too_short_sequences_file_path TOO_SHORT_SEQUENCES_PATH
                        too short sequences fasta file for metabat2
  --too_short_sequences_file_hid TOO_SHORT_SEQUENCES_HID
                        historic ID for too short sequences fasta file from Galaxy for metabat2
  --unbinned_sequences_file_path UNBINNED_SHORT_SEQUENCES_PATH
                        unbinned sequences fasta file for metabat2
  --unbinned_sequences_file_HID UNBINNED_SHORT_SEQUENCES_HID
                        historic ID for unbinned sequences fasta file from Galaxy for metabat2
  --low_depth_sequences_file_path LOW_DEPTH_SHORT_SEQUENCES_PATH
                        low depth sequences fasta file for metabat2
  --low_depth_sequences_file_hid LOW_DEPTH_SHORT_SEQUENCES_HID
                        historic ID for low depth fasta file from Galaxy for metabat2

MMseqs2linclust

$ tooldistillator mmseqs2linclust --help
usage: tooldistillator.py mmseqs2linclust <options>

Extract information from output(s) of mmseqs2linclust (rep_seqs.fasta : file with representative cluster sequences)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             rep_seq file hid for mmseqs2 linclust
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for mmseqs2
  --cluster_fasta_like_path CLUSTER_FASTA_LIKE_PATH
                        file containing all the clustered sequences for mmseqs2 linclust
  --cluster_fasta_like_hid CLUSTER_FASTA_LIKE_HID
                        historic ID for file containing all the clustered sequences from Galaxy for mmseqs2 linclust
  --tsv_file_path TSV_FILE_PATH
                        Cluster TSV file from mmseqs2 linclust
  --tsv_file_hid TSV_FILE_HID
                        historic ID to cluster TSV file from Galaxy for mmseqs2 linclust 

MMseqs2taxonomy

$ tooldistillator mmseqs2taxonomy --help
usage: tooldistillator.py mmseqs2taxonomy <options>

Extract information from output(s) of mmseqs2taxonomy (tax_output.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             rep_seq file hid for mmseqs2 taxonomy
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for mmseqs2
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for mmseqs2
  --kraken_report_path KRAKEN_REPORT_PATH
                        file containing kraken taxonomy report for mmseqs2 taxonomy
  --kraken_report_hid KRAKEN_REPORT_HID
                        historic ID for kraken taxonomy report from Galaxy for mmseqs2 taxonomy
  --krona_report_path KRONA_REPORT_PATH
                        file containing krona taxonomy report for mmseqs2 taxonomy
  --krona_report_hid KRONA_REPORT_HID
                        historic ID for krona taxonomy report from Galaxy for mmseqs2 taxonomy 

MultiQC

$ tooldistillator multiqc --help   
usage: tooldistillator.py multiqc <options>

Extract information from output(s) of multiqc (output.html)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        abricate version for multiqc
  --hid HID             historic ID for abricate file from galaxy for multiqc
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for multiqc

Plasmidfinder

$ tooldistillator plasmidfinder --help
usage: tooldistillator.py plasmidfinder <options>

Extract information from output(s) of plasmidfinder (plasmidfinder.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID for plasmidfinder file from galaxy for plasmidfinder
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        plasmidfinder version for plasmidfinder
  --reference_database_version REFERENCE_DATABASE_VERSION
                        plasmidfinder DB version for plasmidfinder
  --plasmid_result_tabular_path PLASMID_RESULT_TABULAR_PATH
                        plasmidfinder results in tabular format for plasmidfinder
  --plasmid_result_tabular_hid PLASMID_RESULT_TABULAR_HID
                        plasmidfinder results hid in Galaxy for plasmidfinder
  --genome_hit_path GENOME_HIT_PATH
                        fasta file with hits in genome, doesn't work for multiple input for plasmidfinder
  --genome_hit_hid GENOME_HIT_HID
                        Historic ID for genome hit file from galaxy for plasmidfinder
  --plasmid_hit_path PLASMID_HIT_PATH
                        fasta file with plasmid sequences, doesn't work for multiple input for plasmidfinder
  --plasmid_hit_hid PLASMID_HIT_HID
                        Historic ID for plasmid sequence hit file from galaxy for plasmidfinder

Polypolish

$ tooldistillator polypolish --help
usage: tooldistillator.py polypolish <options>

Extract information from output(s) of polypolish (contig.fasta)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to polypolish contigs file from Galaxy for polypolish
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        polypolish version number for polypolish
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for polypolish

Prodigal

$ tooldistillator prodigal --help
usage: tooldistillator.py prodigal <options>

Extract information from output(s) of prodigal (output.fnn)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to prodigal contigs file from Galaxy for prodigal
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        prodigal version number for prodigal
  --protein_translation_file_path PROTEIN_TRANSLATION_FILE_PATH
                        Proteins from all the sequences fasta file for prodigal
  --protein_translation_file_hid PROTEIN_TRANSLATION_FILE_HID
                        Historic ID to proteins fasta file from Galaxy for prodigal
  --potential_gene_start_file_path POTENTIAL_GENE_START_FILE_PATH
                        Potential genes start file for prodigal
  --potential_gene_start_file_hid POTENTIAL_GENE_START_FILE_HID
                        Historic ID to potential genes start file from Galaxy for prodigal
  --gbk_genes_coordinate_file_path GBK_GENES_COORDINATE_FILE_PATH
                        GBK genes coordinate file for prodigal
  --gbk_genes_coordinate_file_hid GBK_GENES_COORDINATE_FILE_HID
                        Historic ID to GBK genes coordinate file from Galaxy for prodigal
  --gff_genes_coordinate_file_path GFF_GENES_COORDINATE_FILE_PATH
                        GFF3 genes coordinate file for prodigal
  --gff_genes_coordinate_file_hid GFF_GENES_COORDINATE_FILE_HID
                        Historic ID to GFF3 genes coordinate file from Galaxy for prodigal
  --sco_genes_coordinate_file_path SCO_GENES_COORDINATE_FILE_PATH
                        SCO genes coordinate file for prodigal
  --sco_genes_coordinate_file_hid SCO_GENES_COORDINATE_FILE_HID
                        Historic ID to SCO genes coordinate file from Galaxy for prodigal

Quast

$ tooldistillator quast --help  
usage: tooldistillator.py quast <options>

Extract information from output(s) of quast (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to quast file from Galaxy for quast
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for quast
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        Quast version number for quast
  --quast_html_path QUAST_HTML_PATH
                        Quast html report file for quast
  --quast_html_hid QUAST_HTML_HID
                        Historic ID to quast html file from Galaxy for quast

Recentrifuge

$ tooldistillator recentrifuge --help
usage: tooldistillator.py recentrifuge <options>

Extract information from output(s) of recentrifuge (data.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             historic ID to recentrifuge data file provided by Galaxy for recentrifuge
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        recentrifuge version for recentrifuge
  --reference_database_version REFERENCE_DATABASE_VERSION
                        ncbi taxonomy DB version for recentrifuge
  --rcf_stat_path RCF_STAT_PATH
                        recentrifuge statistic file for recentrifuge
  --rcf_stat_hid RCF_STAT_HID
                        historic ID provided by Galaxy for recentrifuge
  --rcf_html_path RCF_HTML_PATH
                        recentrifuge html report file for recentrifuge
  --rcf_html_hid RCF_HTML_HID
                        recentrifuge html report file for recentrifuge

RefseqMasher

$ tooldistillator refseqmasher --help
usage: tooldistillator.py refseqmasher <options>

Extract information from output(s) of refseqmasher (OUTPUT.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to refseq result from Galaxy for refseqmasher
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for refseqmasher
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        refseqmasher version number for refseqmasher

Shovill

$ tooldistillator shovill --help   
usage: tooldistillator.py shovill <options>

Extract information from output(s) of shovill (contig.fasta)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID to shovill contigs file from Galaxy for shovill
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        shovill version number for shovill
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for shovill
  --contig_graph_path CONTIG_GRAPH_PATH
                        Assembly graph file for shovill
  --contig_graph_hid CONTIG_GRAPH_HID
                        Historic ID to assembly graph from Galaxy for shovill
  --bam_file_path BAM_FILE_PATH
                        Binary Alignment file from shovill for shovill
  --bam_file_hid BAM_FILE_HID
                        Historic ID to alignment file from Galaxy for shovill

StarAMR

$ tooldistillator staramr --help
usage: tooldistillator.py staramr <options>

Extract information from output(s) of staramr (resfinder.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID provided by Galaxy for resfinder file for staramr
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        tool version for staramr
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for staramr
  --mlst_file_path MLST_FILE_PATH
                        mlst output file from staramr for staramr
  --mlst_file_hid MLST_FILE_HID
                        Historic ID provided by Galaxy for mlst file for staramr
  --plasmidfinder_file_path PLASMIDFINDER_FILE_PATH
                        plasmid output file from staramr for staramr
  --plasmidfinder_file_hid PLASMIDFINDER_FILE_HID
                        Historic ID provided by Galaxy for plasmid for staramr
  --pointfinder_file_path POINTFINDER_FILE_PATH
                        pointfinder output file from staramr for staramr
  --pointfinder_file_hid POINTFINDER_FILE_HID
                        Historic ID provided by Galaxy for pointfinder for staramr
  --setting_file_path SETTING_FILE_PATH
                        setting file from staramr analysis for staramr
  --setting_file_hid SETTING_FILE_HID
                        Historic ID provided by Galaxy for settings for staramr

Sylph

$ tooldistillator sylph --help
usage: tooldistillator.py sylph <options>

Extract information from output(s) of sylph (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for sylph
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for sylph
  --hid HID             historic ID for sylph file from galaxy

Sylph-tax

$ tooldistillator sylphtax --help
usage: tooldistillator.py sylphtax <options>

Extract information from output(s) of sylphtax (merge_report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        version for sylphtax
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for sylphtax
  --hid HID             historic ID for sylphtax file from galaxy
  --taxonomic_profile_folder_path TAXONOMIC_PROFILE_FOLDER_PATH
            taxonomic profile folder for sylph-tax
  --taxonomic_profile_folder_hid TAXONOMIC_PROFILE_FOLDER_HID
            historic ID to taxonomic profile folder from Galaxy

Tabular_file

$ tooldistillator tabular_file --help
usage: tooldistillator.py tabular_file <options>

Extract information from output(s) of tabular_file (report.tsv)

positional arguments:
  report                Path to report(s)

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Output location
  --hid HID             Historic ID provided by Galaxy for tabular file for tabular_file
  --analysis_software_name ANALYSIS_SOFTWARE_NAME
                        Tool name to the input file for tabular_file
  --reference_database_version REFERENCE_DATABASE_VERSION
                        DB version for tabular_file
  --analysis_software_version ANALYSIS_SOFTWARE_VERSION
                        Software version to the input file for tabular_file

Galaxy

This tool is also available for Galaxy on the ToolShed

Contributing

If you want to contribute, please read our Contributing guidelines

Citation

Please cite the ABRomics project when using the tool

Licence

GNU GENERAL PUBLIC LICENSE V.3

Contact

You can contact the ABRomics team to abromics-support@groupes.france-bioinformatique.fr.

关于

从Galaxy工作流JSON文件中提取和解析数据

31.0 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号