Both cblaster and clinker can now be used without installation on the CAGECAT webserver.
Outline
cblaster is a tool for finding clusters of co-located homologous sequences
in BLAST searches.
Given a collection of protein sequences, cblaster can search sequence databases
remotely (via NCBI BLAST API) or locally (via DIAMOND). Search results are parsed
and filtered based on user thresholds for identity, coverage and e-value. The genomic
coordinates of remaining hits are obtained from the NCBI’s Identical Protein
Group (IPG) database (or a local database in local searches). Finally,
cblaster scans for instances of collocation and generates visualisations:
cblaster is tested on Python 3.6, and its only external Python dependency is
the requests module (used for interaction with NCBI APIs).
If you want to perform local searches, you should have diamond installed and available
on your system $PATH.
cblaster will throw an error if a local search is started but it cannot find
diamond or diamond-aligner (alias when installed via apt) on the system.
Usage
cblaster accepts FASTA files and collections of valid NCBI sequence identifiers
(GIs, accession numbers) as input.
A remote search can be performed as simply as:
Cameron L M Gilchrist, Thomas J Booth, Bram van Wersch, Liana van Grieken, Marnix H Medema, Yit-Heng Chooi, cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters, Bioinformatics Advances, 2021;, vbab016, https://doi.org/10.1093/bioadv/vbab016
cblaster makes use of the following tools:
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014).
cblaster
Outline
cblasteris a tool for finding clusters of co-located homologous sequences in BLAST searches.Given a collection of protein sequences,
cblastercan search sequence databases remotely (via NCBI BLAST API) or locally (viaDIAMOND). Search results are parsed and filtered based on user thresholds for identity, coverage and e-value. The genomic coordinates of remaining hits are obtained from the NCBI’s Identical Protein Group (IPG) database (or a local database in local searches). Finally,cblasterscans for instances of collocation and generates visualisations:Installation
cblastercan be installed via pip:or by cloning the repository and installing:
Additionally, we provide executables for Windows and Mac which can be downloaded from here.
Once installed, make sure you configure cblaster with your email address:
You can find example search files, along with generated output, in the examples folder of the repository.
Dependencies
cblasteris tested on Python 3.6, and its only external Python dependency is therequestsmodule (used for interaction with NCBI APIs). If you want to perform local searches, you should havediamondinstalled and available on your system $PATH.cblasterwill throw an error if a local search is started but it cannot finddiamondordiamond-aligner(alias when installed via apt) on the system.Usage
cblasteraccepts FASTA files and collections of valid NCBI sequence identifiers (GIs, accession numbers) as input. A remote search can be performed as simply as:For example, to remotely search the burnettramic acids gene cluster, bua , against the NCBI’s nr database:
A query sequence absence/presence matrix can be generated using the
--binaryargument:cblastercan also generate fully interactive visualisations of the binary table. To view an example, click here.For further usage examples and API documentation, please refer to the documentation.
Useful downstream applications
Here are some useful tools and scripts which use cblaster output for various tasks:
Citation
If you found this tool useful, please cite:
cblastermakes use of the following tools: