目录

InDelFixer


Dr. Armin Töpfer, armintoepfer.com

*** A sensitive aligner for 454, Illumina and PacBio data, employing a full Smith-Waterman alignment against a reference.

This java command line application aligns Next-Generation Sequencing (NGS) and third-generation reads to a set of reference sequences, by a prior fast k-mer matching and removes indels, causing frame shifts. In addition, only a specific region can be considered.

An iterative refinement of the alignment can be performed, by alignment against the consensus sequence with wobbles.

The output is in SAM format.

Features

  • Fully multithreaded
  • Performes a full Smith-Waterman alignment
  • Multiple sets of affine gap costs can be used to find optimal alignment for each read
  • Paired-end reads are properly paired with SAM Flags
  • Accepts multiple reference genomes with wobbles
  • Iterative alignment against a consensus with wobbles to increase alignment quality

PREREQUISITES TO RUN:

RUN:

454/Roche:

java -jar InDelFixer.jar -i libCase102.sff -g referenceGenomes.fasta

But I encourage to convert the sff to fastq with sff2fastq input.sff -o input.fastq sff2fastq can be installed with:

git clone git://github.com/indraniel/sff2fastq.git;
cd sff2fastq;
make;

Fasta / PacBio ccs:

java -jar InDelFixer.jar -i libCase102.fasta -g referenceGenomes.fasta

For PacBio input, please use -noHashing since the PacBio error rate is too high for a reliable kmer-matching.

Illumina paired end:

java -jar InDelFixer.jar -i libCase102_R1.fastq -ir libCase102_R2.fastq -g referenceGenomes.fasta

High quality alignment

With parameter -sensitive, multiple affine gap costs are tested for each read and the best alignment is kept.

Affine GAP costs

Gap costs for the used Smith-Waterman can be set with

-gop 3 (gap open)
-gex 1 (gap extend)

Predefined: 10 open & 3 extend. Tested with with PacBio, Illumina and 454 data on HIV, HCV and HBV data.

Iterative refinement

The alignment can be improved by aligning against the consensus sequence. The parameter -refine INT takes a positive number as input and activates the iterative refinement. Only works if the alignment is against one reference genome.

Remove conserved deletions:

During the iterative alignment, conserved deletions can be removed with -rmDel.

Remove frame-shift causing deletions

With parameter -fix, frame-shift causing deletions are replaced with the consensus sequence.

Line breaks

In the case that a single fastq entry is longer than four lines, which is caused by line breaks in the sequence and quality string, use -flat.

Extract region:

In addition, only a specific region can be extracted with -r begin-end, for example a certain gene: java -jar InDelFixer.jar -i libCase102.sff -g referenceGenomes.fasta -r 342-944

FILTER

  -l      INT    : Minimal read-length prior alignment (default 0)
  -la     INT    : Minimal read-length after alignment (default 0)
  -ins    DOUBLE : The maximum percentage of insertions allowed [range 0.0 - 1.0] (default 1.0)
  -del    DOUBLE : The maximum percentage of deletions allowed [range 0.0 - 1.0] (default 1.0)
  -sub    DOUBLE : The maximum percentage of substitutions allowed [range 0.0 - 1.0] (default 1.0)
  -maxDel INT    : The maximum number of consecutive deletions allowed (default no filtering)

Help:

Further help can be shown by running without additional parameters: java -jar InDelFixer.jar

BAM output:

In order to convert the reads.sam into the BAM format, please install samtools and run:

samtools view -bS reads.sam > out.bam; 
samtools sort out.bam reads; 
samtools index reads.bam; 
rm out.bam;

COMPILE (only for dev):

Install Maven 3

cd InDelFixer
mvn -DartifactId=samtools -DgroupId=net.sf -Dversion=1.9.6 -Dpackaging=jar -Dfile=src/main/resources/jars/sam-1.96.jar -DgeneratePom=false install:install-file
mvn clean package
java -jar target/InDelFixer.jar

CONTACT:

Armin Töpfer
armin.toepfer (at) gmail.com
http://www.armintoepfer.com

Contributions

Armin Töpfer
David Seifert
Alexander Artyomenko

LICENSE:

GNU GPLv3 http://www.gnu.org/licenses/gpl-3.0

关于

对含 InDel 的序列比对或变异结果进行修正。

1.7 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号