LTR_retriever is a command line program (in Perl) for accurate identification of LTR retrotransposons (LTR-RTs) from outputs of LTRharvest, LTR_FINDER, MGEScan 3.0.0, LTR_STRUC, and LtrDetector, and generates non-redundant LTR-RT library for genome annotations.
By default, the program will generate whole-genome LTR-RT annotation and the LTR Assembly Index (LAI) for evaluations of the assembly continuity of the input genome. Users can also run LAI separately (see Usage).
Installation
LTR_retriever is installation-free but requires dependencies: TRF, BLAST+, BLAST or CD-HIT, HMMER, RepeatMasker, and TEsorter. You may specify the path to these programs in the command line (run LTR_retriever -h for details) or install them in the following ways:
Alternatively, you may use the conda recipe, but due to the large number of dependencies, conda solve may take hours… Unfortunately, the conda recipe currently could not be installed properly with mamba.
Simply modify the ‘paths’ file in the LTR_retriever directory
vi /your_path_to/LTR_retriever/paths
Inputs
Two types of inputs are required for LTR_retriever
Genomic sequence
LTR-RT candidates
LTR_retriever takes multiple LTR-RT candidate inputs including the screen output of LTRharvest and the screen output of LTR_FINDER. For outputs of other LTR identification programs, you may convert them to LTRharvest-like format and feed them to LTR_retriever (with -inharvest). Users need to obtain the input file(s) from the aforementioned programs before running LTR_retriever. Either a single input source or a combination of multiple inputs are acceptable. For more details and examples please see the manual.
It’s sufficient and recommended to use LTRharvest and LTR_FINDER results for LTR_retriever. However, if you want to analyze results from LTR_STRUC, MGEScan 3.0.0, and LtrDetector, you can use the following scripts to convert their outputs to the LTRharvest format, then feed LTR_retriever with -inharvest. You may concatenate multiple LTRharvest format inputs into one file. For instructions, run:
Intact LTR-RTs with coordinate and structural information
Summary tables (.pass.list)
GFF3 format output (.pass.list.gff3)
LTR-RT library
All non-redundant LTR-RTs (.LTRlib.fa)
All non-TGCA LTR-RTs (.nmtf.LTRlib.fa)
All LTR-RTs with redundancy (.LTRlib.redundant.fa)
Whole-genome LTR-RT annotation by the non-redundant library
GFF format output (.out.gff)
LTR family summary (.out.fam.size.list)
LTR superfamily summary (.out.superfam.size.list)
LTR distribution on each chromosome (.out.LTR.distribution.txt)
LTR Assembly Index (.out.LAI)
Usage
Best practice: It’s highly recommended to use short and simple sequence names. For example, use letters, numbers, and _ to generate unique names shorter than 15 bits. If there are long sequence names, LTR_retriever will try to convert it for you, but not always successful.
To obtain raw input files with LTRharvest and LTR_FINDER_parallel:
Ou S. and Jiang N. (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.open access
If you find LAI useful, please cite:
Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730.open access
Table of Contents
Introduction
LTR_retriever is a command line program (in Perl) for accurate identification of LTR retrotransposons (LTR-RTs) from outputs of LTRharvest, LTR_FINDER, MGEScan 3.0.0, LTR_STRUC, and LtrDetector, and generates non-redundant LTR-RT library for genome annotations.
By default, the program will generate whole-genome LTR-RT annotation and the LTR Assembly Index (LAI) for evaluations of the assembly continuity of the input genome. Users can also run LAI separately (see
Usage).Installation
LTR_retriever is installation-free but requires dependencies: TRF, BLAST+, BLAST or CD-HIT, HMMER, RepeatMasker, and TEsorter. You may specify the path to these programs in the command line (run
LTR_retriever -hfor details) or install them in the following ways:Quick installation using conda
Direct installation using the yml file:
Alternatively, you may use the conda recipe, but due to the large number of dependencies, conda solve may take hours… Unfortunately, the conda recipe currently could not be installed properly with
mamba.Step by step installation using conda
You may use conda to quickly install all dependencies and LTR_retriever is then good to go:Standard installation
You can also provide the fixed paths to the following dependent programs.
Simply modify the ‘paths’ file in the LTR_retriever directory
Inputs
Two types of inputs are required for LTR_retriever
LTR_retriever takes multiple LTR-RT candidate inputs including the screen output of LTRharvest and the screen output of LTR_FINDER. For outputs of other LTR identification programs, you may convert them to LTRharvest-like format and feed them to LTR_retriever (with
-inharvest). Users need to obtain the input file(s) from the aforementioned programs before running LTR_retriever. Either a single input source or a combination of multiple inputs are acceptable. For more details and examples please see the manual.It’s sufficient and recommended to use LTRharvest and LTR_FINDER results for LTR_retriever. However, if you want to analyze results from LTR_STRUC, MGEScan 3.0.0, and LtrDetector, you can use the following scripts to convert their outputs to the LTRharvest format, then feed LTR_retriever with
-inharvest. You may concatenate multiple LTRharvest format inputs into one file. For instructions, run:Click to download executables for LTR_FINDER_parallel and LTRharvest.
Outputs
The output of LTR_retriever includes:
Usage
Best practice: It’s highly recommended to use short and simple sequence names. For example, use letters, numbers, and _ to generate unique names shorter than 15 bits. If there are long sequence names, LTR_retriever will try to convert it for you, but not always successful.
To obtain raw input files with LTRharvest and LTR_FINDER_parallel:
To run LTR_retriever:
To run LAI:
For more details about the usage and parameter settings, please see the help pages by running:
Or refer to the manual document.
For questions and Issues please see: https://github.com/oushujun/LTR_retriever/issues
Citations
If you find LTR_retriever useful, please cite:
Ou S. and Jiang N. (2018). LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 176(2): 1410-1422.open accessIf you find LAI useful, please cite:
Ou S., Chen J. and Jiang N. (2018). Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. gky730.open access