circRNAFull: an R package for reconstruction of full length circRNA sequence using chimeric alignment information
Requirements
R (>= 3.0.0)
Biostrings
seqinr
stringi
Rsamtools
Installation
From github
To install the package from github first you need to install the package “devtools” using the following command:
install.packages("devtools", dep=T)
The package “circRNAFull” depends on two bioconductor packages “Biostrings” and “Rsamtools” which cannot be installed automatically while installing “circRNAFull” using “devtools”. So, you need to install these two packages manually using the following way:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
BiocManager::install("Rsamtools")
Finally, install “circRNAFull” by the following command:
devtools::install_github("tofazzalh/circRNAFull", dep = T)
Start analysis by loading the package with the following command:
library("circRNAFull")
Extracting transcript name and spanning reads for circRNAs
Description
This function makes a text file containing circle ids, transcript name and the spanning redas for each circRNAs.
circ_out is the output from the circRNA prediction tool CIRCexplorer.
chimeric_out is the junction file obtained from the chimeric alignment produced by STAR aligner.
output is the name of the output folder.
Value
The circle ids, transcript name and the spanning reads will be produced in ‘output‘ folder.
Example
#Loading an example output from the circRNA prediction tool CIRCexplorer
data(CIRCexplorer_output)
circ_out<-CIRCexplorer_output
#Loading an example junction file obtained from the. chimeric alignment produced by STAR
data(Chimeric.out.junction)
chimeric_out<-Chimeric.out.junction
#A temporary directory is created as an output folder.
output<-tempdir()
#Extracting circle ids, transcript name and spanning reads of circRNAs.
#The output will be written in 'output' directory
extract_circle_ids(circ_out, chimeric_out, output)
Extracting individual alignment file for each circRNA from chimeric alignment bam file
Description
This function generates individual alignment file for each circRNA from the chimeric alignment produced by STAR.
circle_id is a data frame containg circle ids, transcript name and spanning reads of circRNAs obtained from function
extract_circle_ids().
bamfile is a chimeric alignment bam file produced by STAR read as a data frame.
outfolder is the name of the output directory.
Value
Individual alignment files for each circRNAs will be produced in outfolder.
Example
#loading an example circle_id file
data(circle_ID)
circle_id<-circle_ID
#Please upload your chimeric alignment bam file.Suppose you have the chimeric
#alignemnt bam file 'Chimeric.out.bam' in you working directory.
#Then run:
bam <- scanBam("Chimeric.out.bam")
bamfile <- as.data.frame(bam)
#Creating an output directory
outfolder<-tempdir()
#Extracting individual alignment file for each circRNAs.
#The individual alignment file will be generated in the 'outfolder' directory.
extract_reads_from_bam(circle_id, bamfile, outfolder)
Extracting exon for the circRNAs
Description
This function extracts exons for the circRNAs from the CIGAR value of the chimeric alignment obtained from STAR.
bed_file is a data frame containing the exon coordinates obtained from reference genome annotation.
folder_name is the location of the individual alignments obtained from function extract_reads_from_bam().
circle_ids is a data frame containing the circle id, transcript name and spanning reads obtained from function extract_circle_ids().
output_folder is the output folder name.
Value
A file exon_count.txt containing exons for each circRNA will be produced in output_folder.
Examples
#Loading an example bed_file
data(exon_coordinate)
bed_file<-exon_coordinate
#Example of folder_name containing the individual alignments
folder_name<-system.file('extdata', package = 'circRNAFull')
#Loading an example circle_ids file
data(circle_ID)
circle_ids<-circle_ID
#creating a temporary output_folder
output_folder<-tempdir()
#Extracting exons for all circRNAs
Extract_exon(bed_file, folder_name, circle_ids, output_folder)
seq_name is a vector containing the name of chromosomes of the reference genome.
sequence is a vector containing the sequences of the reference genome.
exon_count_file is the exon count file after deleting skip exons.
output is the output folder name.
Value
The reconstructed circRNA sequences circRNAseq.fasta will be produced in folder output.
Examples
#Please download the reference genome.
#Suppose you have downloaded the reference genome 'hg38.fa' in you working directory.
#Then run:
ref_genome<-"hg38.fa"
fastaFile <- readDNAStringSet(ref_genome)
seq_name = sub('\\ .*', '', names(fastaFile))
sequence = paste(fastaFile)
#Loading an exon count file after deleting skip exon
data(exon_count_after_skipping_exon)
exon_count_file<-exon_count_after_skipping_exon
#Creating an output directory
output<-tempdir()
#Reconstructing the circRNA sequences.
#The circRNA sequences will be generated in the 'output' directory.
extract_sequence(seq_name, sequence, exon_count, output)
circRNAFull: an R package for reconstruction of full length circRNA sequence using chimeric alignment information
Requirements
Installation
From github
To install the package from github first you need to install the package “devtools” using the following command:
The package “circRNAFull” depends on two bioconductor packages “Biostrings” and “Rsamtools” which cannot be installed automatically while installing “circRNAFull” using “devtools”. So, you need to install these two packages manually using the following way:
Finally, install “circRNAFull” by the following command:
Start analysis by loading the package with the following command:
Extracting transcript name and spanning reads for circRNAs
Description
This function makes a text file containing circle ids, transcript name and the spanning redas for each circRNAs.
Usage
Arguments
circ_outis the output from the circRNA prediction tool CIRCexplorer.chimeric_outis the junction file obtained from the chimeric alignment produced by STAR aligner.outputis the name of the output folder.Value
The circle ids, transcript name and the spanning reads will be produced in ‘output‘ folder.
Example
Extracting individual alignment file for each circRNA from chimeric alignment bam file
Description
This function generates individual alignment file for each circRNA from the chimeric alignment produced by STAR.
Usage
Arguments
circle_idis a data frame containg circle ids, transcript name and spanning reads of circRNAs obtained from functionextract_circle_ids().bamfileis a chimeric alignment bam file produced by STAR read as a data frame.outfolderis the name of the output directory.Value
Individual alignment files for each circRNAs will be produced in outfolder.
Example
Extracting exon for the circRNAs
Description
This function extracts exons for the circRNAs from the CIGAR value of the chimeric alignment obtained from STAR.
Usage
Arguments
bed_fileis a data frame containing the exon coordinates obtained from reference genome annotation.folder_nameis the location of the individual alignments obtained from functionextract_reads_from_bam().circle_idsis a data frame containing the circle id, transcript name and spanning reads obtained from functionextract_circle_ids().output_folderis the output folder name.Value
A file exon_count.txt containing exons for each circRNA will be produced in output_folder.
Examples
Detecting skip exon
Description
This function detects skip exon.
Usage
Arguments
circle_reads_folderis the location of the individual alignments obtained from functionextract_reads_from_bam().exon_count_fileis a data frame containing exons for circRNAs obtained from functionExtract_exon().output_folderis the name of the output directory.Value
A file detect_skip_exon.txt containing the skip exons will be produced in output_folder.
Examples
Extracting exon after deleting skip exon
Description
This function extracts the remaining exons after deleting skip exons.
Usage
Arguments
exon_count_fileis a data frame containing exons for circRNAs obtained from functionExtract_exon().skip_exon_fileis a data frame containing skip exons for circRNAs obtained from functionskip_exon_detection().output_folderis the name of the output directory.Value
A text file extract_exon_after_skipping_exon.txt containing the exons after deleting skip exon will be produced in output_folder.
Examples
Reconstruction of full length circRNA sequences
Description
This function reconstructs the full length circRNA sequences.
Usage
Arguments
seq_nameis a vector containing the name of chromosomes of the reference genome.sequenceis a vector containing the sequences of the reference genome.exon_count_fileis the exon count file after deleting skip exons.outputis the output folder name.Value
The reconstructed circRNA sequences circRNAseq.fasta will be produced in folder output.
Examples