AnnoSINE_v2 is a SINE annotation tool for plant/animal genomes. The program is designed to efficiently generate high-quality non-redundant SINE libraries for genome annotation. This program is a new version of AnnoSINE. Thus, it has the same workflow as AnnoSINE (shown below).
Prerequisites
To use AnnoSINE_v2, you need to install the tools listed below.
positional arguments:
mode [1 | 2 | 3]
Choose the running mode of the program.
1--Homology-based method;
2--Structure-based method;
3--Hybrid of homology-based and structure-based method.
input_filename input genome assembly path
output_filename output files path
optional arguments:
-h, --help show this help message and exit
-e, --hmmer_evalue Expectation value threshold for saving hits of homology search (default: 1e-10)
-v, --blast_evalue Expectation value threshold for sequences alignment search (default: 1e-10)
-l, --length_factor Threshold of the local alignment length relative to the the BLAST query length (default: 0.3)
-c, --copy_number_factor Threshold of the copy number that determines the SINE boundary (default: 0.15)
-s, --shift Maximum threshold of the boundary shift (default: 80)
-g, --gap Maximum threshold of the trancated gap (default: 10)
-minc, --copy_number Minimum threshold of the copy number for each element (default: 20)
-numa, --num_alignments --num_alignments value for blast alignments (default: 50000)
-maxb, --base_copy_number Maximum threshold of copy number for the first and last base (default: 1)
-a, --animal If set to 1, then Hmmer will search SINE using the animal hmm files from Dfam. If set to 2, then Hmmer will search SINE using both the plant and animal hmm files. (default: 0)
-b, --boundary Output SINE seed boundaries based on TSD or MSA (default: msa)
-f, --figure Output the SINE seed MSA figures and copy number profiles (y/n). Please note that this step may take a long time to process. (default: n)
-temd, --temp_dir The temp dir used by paf2blast6 script. If not set, will use /tmp folder automatically.
-auto, --automatically_continue If set to 1, then the program will skip finished steps and continue unifinished steps for a previously processed output dir. (default: 0)
-r, --non_redundant Annotate SINE in the whole genome based on the non—redundant library (y/n) (default: y)
-t, --threads Threads for each tool in AnnoSINE (default: 36)
-irf, --irf_path Path to the irf program (default: '')
-rpm, --RepeatMasker_enable If set to 0, then will not run RepearMasker (Step 8 for the code). (default: 1)
Inputs
Genome sequence(fasta format).
Outputs
Redundant SINE library: $ Step7_cluster_output.fasta
Non-redundant SINE library with serial number: $Seed_SINE.fa.
Whole-genome SINE annotation: $Input_genome.fasta.out. This file contains high-similarity SINE annotations.
Intermediate Files
SINE candidates information predicted by homology search: $ ../Family_Seq/Family_Name/Family_Name.out. (m=1 or 3 required)
SINE candidate sequences predicted by structure search: $ ../Input_Files/Input_genome-matches.fasta. (m=2 or 3 required)
Extended candidate sequences for TSD search: $ Step1_extend_tsd_input.fa
TSD identification outputs: $ Step2_tsd.txt
MSA extended input sequences flanked with TSD: $ Step2_extend_blast_input.fa
MSA output: $ Step3_blast_output.out
Intermediate sequences with MSA quality examination: $ Step3_blast_process_output.fa
SINE candidate sequences after MSA quality examination: $ Step4_rna_input.fasta
SINE candidates blast against RNA database outputs $ Step4_rna_output.out
Classified SINE candidates after RNA examintation $ Step4_rna_output.fasta
AnnoSINE_v2
SINE Annotation Tool for Plant/Animal Genomes
Table of Contents
Introduction
AnnoSINE_v2 is a SINE annotation tool for plant/animal genomes. The program is designed to efficiently generate high-quality non-redundant SINE libraries for genome annotation. This program is a new version of AnnoSINE. Thus, it has the same workflow as AnnoSINE (shown below).
Prerequisites
To use AnnoSINE_v2, you need to install the tools listed below.
Installation
Installation via GitHub.
Installation via Bioconda.
or
It should be noted that some commands have been replaced if you install AnnoSINE_v2 using bioconda/pip. (See below)
Usage
If the program stops in a certain step or has no output, this may result from the strict filtering cutoff. You can try the command below:
Argument
Inputs
Genome sequence(fasta format).
Outputs
Intermediate Files
Testing
You can test the AnnoSINE_v2 with one chromosome in Arabisopsis thaliana (it takes about 6 mins).
Results of AnnoSINE_v2 tests on testing data are saved in Output_Files.
Citations