目录

TeXP

TeXP is a pipeline to evaluate the transcription level of transposable elements in short read RNA-seq data

About

TeXP is a pipeline for quantifying abundances of Transposable Elements transcripts from RNA-Seq data. TeXP is based on the assumption that RNA-seq reads overlapping Transposable Elements is a composition of pervasive transcription signal and autonomous transcription of Transposable Elements.

https://www.biorxiv.org/content/10.1101/648667v1.full

How to quickly run TeXP

docker run -it fnavarro/texp:latest /bin/bash

Download a fastq file from a RNA-seq experiment, for example, MCF-7 from the ENCODE project

  wget -c -t0 "https://www.encodeproject.org/files/ENCFF000HFF/@@download/ENCFF000HFF.fastq.gz" -O file.fastq.gz

Run TeXP

  ./TeXP.sh -f file.fastq.gz -t 1 -o process/example/ -n quick_texp_run

The output files will be generated at:

  ls process/example/quick_texp_run
    *.L1HS_hg38.count (Naive counts) 
    *.L1HS_hg38.count.corrected (Corrected counts)
    *.L1HS_hg38.count.rpkm (Naive RPKM)
    *.L1HS_hg38.count.rpkm.corrected (Corrected RPKM)
    *.L1HS_hg38.count.signal_proportions 

TIPS: If fastq files are stored locally you can use

docker run -it -v ~/Desktop/:/texp fnavarro/texp:latest /bin/bash

To mount “~/Desktop” at your docker container

Requirements

  • Bowtie2 (2.3+)
  • Bedtools (2.26+)
  • Fastx-toolkit (0.0.14+)
  • perl (5.24+)
  • python (2.7)
  • R (3.3+)
  • Penalized package (0.49+)
  • samtools (1.3+)
  • wgsim (a12da33 on Oct 17, 2011)

Download TeXP

gt; git clone https://github.com/gersteinlab/texp.git

Edit TeXP.sh and Update INSTALL_DIR variable to the path where TeXP was cloned

Installing TeXP dependencies

apt-get update

  • Install binaries dependencies

apt-get install -y
bedtools=2.26.0+dfsg-3
bowtie2=2.3.0-2
fastx-toolkit=0.0.14-3
gawk=1:4.1.4+dfsg-1
git
perl=5.24.1-3+deb9u1
python=2.7.13-2
r-base=3.3.3-1
r-base-dev=3.3.3-1
samtools=1.3.1-3
wget

  • Install Wgsim

mkdir -p /src; \ cd /src ;
git clone https://github.com/lh3/wgsim.git;
cd wgsim;
gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm;
mv wgsim /usr/bin/;
cd /;

  • Download Libraries

Fix path (/data/library) to the a proper location at your computation enviroment

mkdir -p /data/library/rep_annotation;
cd /data/library/rep_annotation;
wget -c -t0 “http://files2.gersteinlab.org/public-docs/2019/08.14/rep_annotation.hg38.tar.bz2" -O rep_annotation.hg38.tar.bz2;
tar xjvf rep_annotation.hg38.tar.bz2;
rm -Rf rep_annotation.hg38.tar.bz2

mkdir -p /data/library/bowtie2;
cd /data/library/bowtie2;
wget -c -t0 “http://files2.gersteinlab.org/public-docs/2019/08.14/bowtie2.hg38.tar.bz2" -O bowtie2.hg38.tar.bz2;
tar xjvf bowtie2.hg38.tar.bz2;
rm -Rf bowtie2.hg38.tar.bz2

  • Install R packages dependencies

echo ‘install.packages(c(“penalized”), repos=”http://cloud.r-project.org", dependencies=TRUE)’ > /tmp/packages.R
&& Rscript /tmp/packages.R

TeXP config

A few paramaters must be setup so TeXP can properly work outside a docker enviroment; Parameters are set on opts.mk and the user MUST properly set it up.

  • LIBRARY_PATH: Absolute path pointing to TeXP library, general this is the path you downloaded TeXP
  • EXT_LIBRARY_PATH: Absolute path containing the bowtie2 reference index and Transposable element annotation bed file, downloaded as instructed above
  • EXE_DIR: If binaries are found in a single path, EXE_DIR can be used to generalize binary location. For example, if bowtie2, bedtools, etc are located at /usr/bin/, you should set EXE_DIR := /usr
  • Dependencies installed in different paths should be defined manually, for example, if wgsim is installed at the home folder, the user must set:
    • WGSIM_BIN := ~/wgsim/bin/wgsim
  • Finally the user must set to CONFIGURED := TRUE

Docker image

Alternatively, docker images containing all dependencies and libraries can be used. The TeXP docker image also is pre-configured to work outside the box. Check https://hub.docker.com/r/fnavarro/texp/ for futher instructions: docker pull fnavarro/texp


Running TeXP

gt; ./TeXP.sh -f [FILE_NAME] -t [INT] -o [OUTPUT_PATH] n [SAMPLE_ID]

-f: Input file (fastq,fastq.gz,sra)

-t: Number of threads

-o: Output path (i.e. ./ or ./processed)

-n: Sample name (i.e. SAMPLE01)


FAQ - Frequently Asked Questions

  1. Does TeXP work for paired end data?

TeXP has been implemented to run one fastq file at a time. Overall, we empirically find that if the RNA-seq library is good, P1 and P2 should yield very similar estimates. Therefore, if using paired-end RNA-seq data, we recommend calculating the mean between both pairs.

  1. Does TeXP work for unstranded data?

Yes!

  1. Can I use other aligners

On (figure 15) [https://journals.plos.org/ploscompbiol/article/file?type=supplementary&id=info:doi/10.1371/journal.pcbi.1007293.s015] we show that aligners do not drastically change TeXP estimations, therefore, while you could use other aligners, we suggest using bowtie2 since all TeXP parameterization has been done on bowtie2

关于

RNA-seq 分析流程,用于估计短读长数据中转座元件转录本的表达丰度,并区分普遍转录与自主转录信号。

40.3 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802032778号