目录

mice

mice computes synteny blocks from genomes expressed as sequences of genomic elements. These elements can come from a genome graph (e.g., unitigs of a compacted de Bruijn graph), or from any other segmentation such as k-mers, genes, or MUMs/MEMs.

The input of mice is a GFF file in which each feature has an ID attribute (1-based index) specifying the element used in the path spelling the genome or chromosome.

Installation

mice is written in rust, therefore you only need cargo to install it:

cargo install --path .

Alternatively, mice is available on bioconda (use conda or mamba):

mamba install -c bioconda mice 

Quick start

We provide five E. coli genomes as an example dataset.

  1. Use the provided graph
    A precomputed example/graph.gff.gz is included.
    Uncompress it (for example: gunzip -c example/graph.gff.gz > graph.gff) and go directly to running mice.

  2. (Optional) Build the pangenome graph yourself

    Install ggcat:

    conda install -c conda-forge -c bioconda ggcat

    Build a compacted de Bruijn graph:

    ggcat build -k 31 -s 1 -l example/list.txt -o graph.gfa --gfa-v1

    Convert the graph to GFF:

    git clone https://github.com/lucaparmigiani/gfa2gff.git
    cd gfa2gff
    make
    cd ..
    ./gfa2gff/gfa2gff 31 graph.gfa $(ls -1 example/*.fna.gz) > graph.gff
  3. Run mice

    mice graph.gff

Usage

mice [OPTIONS] <GRAPH_INPUT>
  • <GRAPH_INPUT> – input graph file (GFF or GFA with path representing genomes)

Options

  • -o, --out-dir <DIR> Output directory (default: mice_output)

  • -r, --remove-dup <X> Remove an element if it occurs at least X times in any genome (0 = disable, default: 0)

  • -m, --min-size <bp> After first compression, drop unmerged elements shorter than <bp> base pairs, then recompress (default: 0)

  • -s, --no-group-by Treat every path as its own genome

  • -h, --help, -V, --version

Output

In <OUT_DIR> MICE writes:

  • output.gff: block annotations (GFF)
  • paths.txt: genomes rewritten as synteny blocks
  • partitions.txt: each synteny block which element it contains
关于

用于处理缺失数据的R语言包,提供多重插补方法

16.1 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号