目录

Scaffold_builder: Combining de novo and reference-guided assembly with Scaffold_builder

Installation

Dependencies

Bioconda

You can now easily install Scaffold_builder using conda via the Bioconda channel. It is as easy as:

# bioconda should handle all the dependencies
conda create -n scaffold_builder -c bioconda scaffold_builder
source activate scaffold_builder

Webserver

# you can upload small query and references files to the webserver
http://edwards.sdsu.edu/scaffold_builder/

Git

# clone scaffold_builder
git clone git@github.com:metageni/Scaffold_builder.git
# run scaffold_builder.py
python2.7 scaffold_builder.py -q [QUERY] -r [REFERENCE]

Usage

scaffold_builder.py -q query_contigs.fna -r reference_genome.fna -p output_prefix [-t] [-i] [-a] [-b]

-q fasta file of contigs
    Required. Query contigs in Fasta format. These contigs may be the output of a de novo
    assembly program such as Newbler, Velvet or MIRA.

-r fasta file containing reference genome
    Required. Reference genome in Fasta format. This should preferably be a completed genome
    sequence.

-p prefix output files
    Required. All the output files have this project name as prefix.

-t length of terminus that will be aligned (default 300 nt)
    At any break between two contigs, scaffold_builder checks whether the termini
    of the adjacent contigs are homologous by aligning them using Smith-Waterman's Algorithm, and
    combines them if that is the case.

-i minimum identity for merging contigs (default 80%)
    If the termini are similar, scaffold_builder assumes that the contigs should
    have been combined by the assembly program, but the similarity was probably
    below the assembly thresholds, or the contigs were not merged due to ambiguous
    read mapping. The sequences are combined and in the case that non-identical
    nucleotides are aligned, the IUPAC code of their consensus is placed in the
    resulting sequence.

-a minimum length for ambiguously mapped contigs (default 95%)
    If a contig maps to more than one location on the reference genome, it will
    not be scaffolded because it's location is ambiguous. This parameter defines
    how much of the length of a contig should be mapped in more than one location
    for it to be considered ambiguously mapped.

-b 0/1 dictates behavior for rearrangements (default 0)
    0: place end-to-end
    1: create new scaffold sequence
    If the mapping of the contigs onto the reference suggests that they overlap,
    but the contig termini are too dissimilar to join them, this option dictates
    whether scaffold_builder places the contigs end-to-end (default; deletions
    expected) or to start a new scaffold sequence (inversions expected).

-g maximum gap length allowed (default 5000nt)

Citing

Scaffold_Builder was written by Genivaldo G. Z. Silva. Feel free to contact me

If you use Scaffold_Builder, please cite it:

Silva GG, Dutilh BE, Matthews TD, Elkins K, Schmieder R, Dinsdale EA, Edwards RA.
Combining de novo and reference-guided assembly with Scaffold_builder,
Source Code for Biology and Medicine 2013.
关于

用于构建基因组序列支架的软件,可将短序列组装成更长的连续序列

92.0 KB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号