This package allows for importing most common motif types into R for use by
functions provided by other Bioconductor motif-related packages. Motifs can be
exported into most major motif formats from various classes as defined by other
Bioconductor packages. Furthermore, this package allows for easy manipulation
of motifs, such as creation, trimming, shuffling, P-value calculations,
filtering, type conversion, reverse complementation, alphabet switching, random
motif site generation, and comparison. Alongside are also included functions
for interacting with sequences, such as motif scanning and enrichment, as well
as sequence creation and shuffling functions. Finally, this package implements
higher-order motifs, allowing for more accurate sequence scanning and motif
enrichment.
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("bjmt/universalmotif")
Note: building the vignettes when installing from source is not recommended, unless you don’t mind waiting an hour for the necessary dependencies to install.
Error when installing from source
If you trying to install the package from source and are getting compiler errors similar to these issues [1, 2, 3], then update your C++ compiler. This is an issue regarding incompatibilities between older compilers and the C++11 lambda functions from the RcppThread package, which is used by the universalmotif package.
Citation
If the universalmotif package has been useful for your research, please cite the following article:
Tremblay BJM (2024). universalmotif: An R package for biological motif analysis. Journal of Open Source Software9, 7012. DOI:10.21105/joss.07012
Brief overview
All of the functions within the universalmotif package are fairly well documented. You can access the documentation from within R, reading the Bioconductor PDF, or browsing the rdrr.io website (the latter is not always up to date). Additionally, several vignettes come with the package, which you can access from within R or on the Bioconductor website:
You can also look through the slides of my Bioc2021 presentation, which goes over some basics of motif representations, scanning, and motif comparison.
A few key functions are also explored below.
The universalmotif motif class and import/export utilities
The universalmotif class is used to store the motif matrix itself, as well as other basic information such as alphabet, background frequencies, strand, and various other metadata slots. There are a number of ways of getting universalmotif class motifs:
Manual motif creation with create_motif() using one of several possible input types:
Consensus sequence
Sequence sites
Numeric matrix
No input: generate random motifs of any length
universalmotif class motifs are highly interoperable with other motif formats:
Import/export from/to several supported formats:
CIS-BP: read_cisbp()
HOMER: read_homer(), write_homer()
JASPAR: read_jaspar(), write_jaspar()
MEME: read_meme(), write_meme()
TRANSFAC: read_transfac(), write_transfac()
UNIPROBE: read_uniprobe()
Generic matrices: read_matrix(), write_matrix()
Conversion from/to several compatible Bioconductor package motif classes using convert_motifs() (some formats cannot go both ways; see the documentation for details):
library(universalmotif)
create_motif()
#>
#> Motif name: motif
#> Alphabet: DNA
#> Type: PPM
#> Strands: +-
#> Total IC: 11.46
#> Consensus: YGTGMMMRGA
#>
#> Y G T G M M M R G A
#> A 0.17 0 0.00 0.04 0.58 0.62 0.29 0.47 0.08 0.77
#> C 0.36 0 0.01 0.00 0.41 0.36 0.68 0.16 0.05 0.00
#> G 0.00 1 0.03 0.95 0.00 0.00 0.04 0.28 0.86 0.23
#> T 0.47 0 0.96 0.02 0.00 0.03 0.00 0.09 0.00 0.00
See ?universalmotif for a list of available metadata slots. Most slots can be accessed using square brackets, e.g. MotifObject["motif"] accesses the raw numeric matrix. You can also dump the contents of all user-facing motif slots at once into a list, e.g. MotifObject[].
Sequence creation, shuffling and background calculation
An important aspect of motif scanning and enrichment is to compare the results with those from a set of random or background sequences. For this, two functions are provided:
create_sequences(): create sequences of any alphabet, with optional desired background frequencies
shuffle_sequences(): shuffle a set of sequences, preserving any size k-let
Additionally, if you are interested in the detailed k-mer content of you sequences you can use get_bkg(). It can be used to calculate sequence background for any size k-mer, and for any sequence alphabet. Results can be shown for individual sequences or merged together. There is also an option to calculate these results in any size windows (with any size overlap between windows) across the sequences.
The universalmotif package provides the scan_sequences() function to quickly scan a set of input sequences for motif hits. Additionally, the add_multifreq() function can be used to generate higher order motifs. These can also be used to scan sequences with higher accuracy. By default scan_sequences() calculates a threshold cutoff from a P-value, though this can be changed to a manual logodds threshold.
Note the differences between the matching sequences of regular scanning versus higher order scanning.
Motif comparison, merging and viewing
A commonly performed task after de novo motif discovery is to check how closely it might resemble known motifs. This can be performed using the highly customizable compare_motifs() with one of several available metrics. Different motifs can also be merged with merge_motifs(). In addition to motif visualization, view_motifs() can also be used to examine the top-scoring alignment chosen by compare_motifs() and merge_motifs().
library(universalmotif)
new.motif <- create_motif("CGCGAAAAAA", name = "New motif")
old.motif <- create_motif("TATATTTTTT", name = "Old motif")
Using very strict alignment parameters, such as no overhangs:
By default compare_motifs() returns a numeric matrix, meaning the output from comparisons between large numbers of motifs can be easily used to generate heatmaps or dendrograms.
universalmotif
This package allows for importing most common motif types into R for use by functions provided by other Bioconductor motif-related packages. Motifs can be exported into most major motif formats from various classes as defined by other Bioconductor packages. Furthermore, this package allows for easy manipulation of motifs, such as creation, trimming, shuffling, P-value calculations, filtering, type conversion, reverse complementation, alphabet switching, random motif site generation, and comparison. Alongside are also included functions for interacting with sequences, such as motif scanning and enrichment, as well as sequence creation and shuffling functions. Finally, this package implements higher-order motifs, allowing for more accurate sequence scanning and motif enrichment.
Installation
Bioconductor release version
GitHub development version
Note: building the vignettes when installing from source is not recommended, unless you don’t mind waiting an hour for the necessary dependencies to install.
Error when installing from source
If you trying to install the package from source and are getting compiler errors similar to these issues [1, 2, 3], then update your C++ compiler. This is an issue regarding incompatibilities between older compilers and the C++11 lambda functions from the RcppThread package, which is used by the
universalmotifpackage.Citation
If the
universalmotifpackage has been useful for your research, please cite the following article:Tremblay BJM (2024). universalmotif: An R package for biological motif analysis. Journal of Open Source Software 9, 7012. DOI:10.21105/joss.07012
Brief overview
All of the functions within the
universalmotifpackage are fairly well documented. You can access the documentation from within R, reading the Bioconductor PDF, or browsing the rdrr.io website (the latter is not always up to date). Additionally, several vignettes come with the package, which you can access from within R or on the Bioconductor website:You can also look through the slides of my Bioc2021 presentation, which goes over some basics of motif representations, scanning, and motif comparison.
A few key functions are also explored below.
The
universalmotifmotif class and import/export utilitiesThe
universalmotifclass is used to store the motif matrix itself, as well as other basic information such as alphabet, background frequencies, strand, and various other metadata slots. There are a number of ways of gettinguniversalmotifclass motifs:create_motif()using one of several possible input types:universalmotifclass motifs are highly interoperable with other motif formats:CIS-BP:read_cisbp()HOMER:read_homer(),write_homer()JASPAR:read_jaspar(),write_jaspar()MEME:read_meme(),write_meme()TRANSFAC:read_transfac(),write_transfac()UNIPROBE:read_uniprobe()read_matrix(),write_matrix()convert_motifs()(some formats cannot go both ways; see the documentation for details):TFBSTools:PFMatrix,PWMatrix,ICMatrix,PFMatrixList,PWMatrixList,ICMatrixList,TFFMFirstMotifDb:MotifListseqLogo:pwmmotifStack:pcm,pfmPWMEnrich:PWMmotifRG:MotifBiostrings:PWMrGADEM:motifSee
?universalmotiffor a list of available metadata slots. Most slots can be accessed using square brackets, e.g.MotifObject["motif"]accesses the raw numeric matrix. You can also dump the contents of all user-facing motif slots at once into a list, e.g.MotifObject[].Sequence creation, shuffling and background calculation
An important aspect of motif scanning and enrichment is to compare the results with those from a set of random or background sequences. For this, two functions are provided:
create_sequences(): create sequences of any alphabet, with optional desired background frequenciesshuffle_sequences(): shuffle a set of sequences, preserving any size k-letAdditionally, if you are interested in the detailed k-mer content of you sequences you can use
get_bkg(). It can be used to calculate sequence background for any size k-mer, and for any sequence alphabet. Results can be shown for individual sequences or merged together. There is also an option to calculate these results in any size windows (with any size overlap between windows) across the sequences.Sequence scanning and higher order motifs
The
universalmotifpackage provides thescan_sequences()function to quickly scan a set of input sequences for motif hits. Additionally, theadd_multifreq()function can be used to generate higher order motifs. These can also be used to scan sequences with higher accuracy. By defaultscan_sequences()calculates a threshold cutoff from a P-value, though this can be changed to a manual logodds threshold.Note the differences between the matching sequences of regular scanning versus higher order scanning.
Motif comparison, merging and viewing
A commonly performed task after de novo motif discovery is to check how closely it might resemble known motifs. This can be performed using the highly customizable
compare_motifs()with one of several available metrics. Different motifs can also be merged withmerge_motifs(). In addition to motif visualization,view_motifs()can also be used to examine the top-scoring alignment chosen bycompare_motifs()andmerge_motifs().Using very strict alignment parameters, such as no overhangs:
After relaxing the alignment parameters:
By default
compare_motifs()returns a numeric matrix, meaning the output from comparisons between large numbers of motifs can be easily used to generate heatmaps or dendrograms.