blue-crab is a conversion tool to convert from ONT’s POD5 format to the community maintained SLOW5/BLOW5 format. Maybe one day ONT will see the light and realise column-based file formats for row-based reading is a bad idea. Till then, Crab go snap snap!
Happy converting!
While we test as much as we can and do our very best to ensure 100% data parity, we have no control over what ONT will do to pod5.
Given their history of ad-hoc changes, there is bound to be cases in the future where this breaks the conversion.
You may use commands like slow5tools quickcheck and index to verify the integrity of the created S/BLOW5 files.
Quickstart
The easiest way to setup blue-crab would be to use precompiled binaries. Choose the commands for your OS.
VERSION=v0.4.0
# For x86_64-inux
wget "https://github.com/Psy-Fer/blue-crab/releases/download/$VERSION/blue-crab-$VERSION-x86_64-linux-binaries.tar.gz" && tar xvf blue-crab-$VERSION-x86_64-linux-binaries.tar.gz && cd blue-crab-$VERSION/
# For x86_64-macos
curl -L "https://github.com/Psy-Fer/blue-crab/releases/download/$VERSION/blue-crab-$VERSION-x86_64-macos-binaries.tar.gz" -o blue-crab-$VERSION-x86_64-macos-binaries.tar.gz && tar xvf blue-crab-$VERSION-x86_64-macos-binaries.tar.gz && cd blue-crab-$VERSION/
# For arm64-macos
curl -L "https://github.com/Psy-Fer/blue-crab/releases/download/$VERSION/blue-crab-$VERSION-arm64-macos-binaries.tar.gz" -o blue-crab-$VERSION-arm64-macos-binaries.tar.gz && tar xvf blue-crab-$VERSION-arm64-macos-binaries.tar.gz && cd blue-crab-$VERSION/
./blue-crab --help
export PATH=[path_to_blue-crab-$VERSION]:$PATH to execute blue-crab from any location.
blue-crab requires python 3.8 or higher (limitation due to ONT’s pod5 library). Using a virtual environment is recommended.
Install zlib development libraries (and optionally zstd development libraries).
The commands to zlib development libraries on some popular distributions :
On Debian/Ubuntu : sudo apt-get install zlib1g-dev
On Fedora/CentOS : sudo dnf/yum install zlib-devel
On OS X : brew install zlib
SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. However, zlib runtime library is available by default on almost all distributions unlike zstd and thus files compressed with zlib will be more ‘portable’. Enabling optional zstd support, requires zstd 1.3 or higher development libraries installed on your system:
On Debian/Ubuntu : sudo apt-get install libzstd1-dev # libzstd-dev on newer distributions if libzstd1-dev is unavailable
On Fedora/CentOS : sudo yum libzstd-devel
On OS X : brew install zstd
pick option 2 or 3
Create a virtual environment using Python 3.8+ and install blue-crab from pip
python3 -m venv ./blue-crab-venv
source ./blue-crab-venv/bin/activate
python3 -m pip install --upgrade pip
# only if you want zstd support and have installed zstd development libraries for zstd build
export PYSLOW5_ZSTD=1
pip install blue-crab
blue-crab --help
Create a virtual environment using Python 3.8+ and install blue-crab from source
# clone the repo
git clone https://github.com/Psy-Fer/blue-crab && cd blue-crab
# create venv
python3 -m venv ./blue-crab-venv
source ./blue-crab-venv/bin/activate
python3 -m pip install --upgrade pip
# only if you want zstd support and have installed zstd development libraries for zstd build
export PYSLOW5_ZSTD=1
# install blue-crab
python3 -m pip install .
blue-crab --help
You can check your Python version by invoking python3 --version. If your native python3 meets this requirement of >=3.8, you can use that, or use a
specific version installed with deadsnakes below. If you install with deadsnakes, you will need to call that specific python, such as python3.8 or python3.9, in all the following commands until you create a virtual environment with venv. Then once activated, you can just use python3. To install a specific version of python, the deadsnakes ppa is a good place to start:
# This is an example for installing python3.8
# you can then call that specific python version
# > python3.8 -m pip --version
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt install python3.8 python3.8-dev python3.8-venv
Optional: wrapper script and adding to PATH
Suppose the name of the virtual environment you created is blue-crab-venv and resides directly in the root of the cloned blue-crab git repository. In that case, you can use the wrapper script available under /path/to/repository/scripts/blue-crab for conveniently executing blue-crab. This script will automatically source the virtual environment, execute the blue-crab with the parameters you specified and finally deactivate the virtual environment. If you add the path of /path/to/repository/scripts/ to your PATH environment variable, you can simply use blue-crab from anywhere.
Optional: real-time POD5 to BLOW5 conversion
A script for performing real-time POD5 to BLOW5 conversion during sequencing is provided here along with instructions.
Usage
Please visit the manual page for all the commands and options. Some examples are give below:
Note that default compression is zlib for maximise compatibility. SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. If you installed blue-crab with zstd support, you can create zstd compressed BLOW5 as:
The following table compares an original 5khz pod5 file from the public zymo dataset (link below), containing 10k reads. Pod5 is using its default VBZ compression which is a mix of zstd and svb-zd for the signal.
The blow5 files are conversions made using blue-crab and timed with /usr/bin/time -v <cmd>. They were carried out on an XPS 15 laptop with a modern SSD hard drive. They all have signal compression set to use svb-zd. Using python3.11.3.
The table shows pod5-vbz is slightly smaller than both blow5-zstd and blow5-zlib. We prefer to default to blow5-zlib as it is more portable as zlib comes with most systems (as discussed above). If you want the best compression and faster conversion times however, blow5-zstd is the clear winner for blow5.
method
size (mb)
time (s)
pod5-vbz
679
-
blow5-zstd
681
3.91
blow5-zlib
689
7.86
-
-
-
blow5-xxx
666
-
I have included an example blow5-xxx to show that we can make the files even smaller than pod5, and this work is under active development. However those compression techniques are currently not available in blue-crab.
Notes
POD5 has had a number of backward compatibility-breaking changes so far. This version of blue-crab is only tested on most recent pod5 files. blue-crab simply relies on ONT’s POD5 API for reading and writing POD5 files, thus, leaving the burden of managing a library that can handle all the variants of POD5 and cleaning up the mess they create. We will not invest time to handle all these various idiosyncrasies in POD5, unlike we did for hundreds of different FAST5 formats when developing slow5tools. If your POD5 files are v0.1.5 or lower, you may check this old readme out.
While this repository is under the MIT license, data files under test/data of this repository are under the CC0 public waiver.
Gamaarachchi, H., Samarakoon, H., Jenner, S.P. et al. Fast nanopore sequencing data analysis with SLOW5. Nat Biotechnol 40, 1026-1029 (2022). https://doi.org/10.1038/s41587-021-01147-4
@article{gamaarachchi2022fast,
title={Fast nanopore sequencing data analysis with SLOW5},
author={Gamaarachchi, Hasindu and Samarakoon, Hiruna and Jenner, Sasha P and Ferguson, James M and Amos, Timothy G and Hammond, Jillian M and Saadat, Hassaan and Smith, Martin A and Parameswaran, Sri and Deveson, Ira W},
journal={Nature biotechnology},
pages={1--4},
year={2022},
publisher={Nature Publishing Group}
}
blue-crab
blue-crab is a conversion tool to convert from ONT’s POD5 format to the community maintained SLOW5/BLOW5 format. Maybe one day ONT will see the light and realise column-based file formats for row-based reading is a bad idea. Till then, Crab go snap snap! Happy converting!
SLOW5 specification: https://hasindu2008.github.io/slow5specs
slow5tools: https://github.com/hasindu2008/slow5tools
pyslow5: https://hasindu2008.github.io/slow5lib/pyslow5_api/pyslow5.html
WARNING
While we test as much as we can and do our very best to ensure 100% data parity, we have no control over what ONT will do to pod5.
Given their history of ad-hoc changes, there is bound to be cases in the future where this breaks the conversion.
You may use commands like slow5tools quickcheck and index to verify the integrity of the created S/BLOW5 files.
Quickstart
The easiest way to setup blue-crab would be to use precompiled binaries. Choose the commands for your OS.
export PATH=[path_to_blue-crab-$VERSION]:$PATHto executeblue-crabfrom any location.or use a python env,
or conda,
Setup
blue-crab requires python 3.8 or higher (limitation due to ONT’s pod5 library). Using a virtual environment is recommended.
Install zlib development libraries (and optionally zstd development libraries).
The commands to zlib development libraries on some popular distributions :
SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. However, zlib runtime library is available by default on almost all distributions unlike zstd and thus files compressed with zlib will be more ‘portable’. Enabling optional zstd support, requires zstd 1.3 or higher development libraries installed on your system:
pick option 2 or 3
Create a virtual environment using Python 3.8+ and install blue-crab from pip
Create a virtual environment using Python 3.8+ and install blue-crab from source
You can check your Python version by invoking
python3 --version. If your native python3 meets this requirement of >=3.8, you can use that, or use a specific version installed with deadsnakes below. If you install with deadsnakes, you will need to call that specific python, such as python3.8 or python3.9, in all the following commands until you create a virtual environment with venv. Then once activated, you can just use python3. To install a specific version of python, the deadsnakes ppa is a good place to start:Optional: wrapper script and adding to PATH
Suppose the name of the virtual environment you created is blue-crab-venv and resides directly in the root of the cloned blue-crab git repository. In that case, you can use the wrapper script available under /path/to/repository/scripts/blue-crab for conveniently executing blue-crab. This script will automatically source the virtual environment, execute the blue-crab with the parameters you specified and finally deactivate the virtual environment. If you add the path of /path/to/repository/scripts/ to your PATH environment variable, you can simply use blue-crab from anywhere.
Optional: real-time POD5 to BLOW5 conversion
A script for performing real-time POD5 to BLOW5 conversion during sequencing is provided here along with instructions.
Usage
Please visit the manual page for all the commands and options. Some examples are give below:
Note that default compression is zlib for maximise compatibility. SLOW5 files compressed with zstd offer smaller file size and better performance compared to the default zlib. If you installed blue-crab with zstd support, you can create zstd compressed BLOW5 as:
Example comparison
The following table compares an original 5khz pod5 file from the public zymo dataset (link below), containing 10k reads. Pod5 is using its default VBZ compression which is a mix of zstd and svb-zd for the signal.
The blow5 files are conversions made using blue-crab and timed with
/usr/bin/time -v <cmd>. They were carried out on an XPS 15 laptop with a modern SSD hard drive. They all have signal compression set to use svb-zd. Usingpython3.11.3.The table shows
pod5-vbzis slightly smaller than bothblow5-zstdandblow5-zlib. We prefer to default toblow5-zlibas it is more portable as zlib comes with most systems (as discussed above). If you want the best compression and faster conversion times however,blow5-zstdis the clear winner for blow5.I have included an example
blow5-xxxto show that we can make the files even smaller than pod5, and this work is under active development. However those compression techniques are currently not available in blue-crab.Notes
POD5 has had a number of backward compatibility-breaking changes so far. This version of blue-crab is only tested on most recent pod5 files. blue-crab simply relies on ONT’s POD5 API for reading and writing POD5 files, thus, leaving the burden of managing a library that can handle all the variants of POD5 and cleaning up the mess they create. We will not invest time to handle all these various idiosyncrasies in POD5, unlike we did for hundreds of different FAST5 formats when developing slow5tools. If your POD5 files are v0.1.5 or lower, you may check this old readme out.
While this repository is under the MIT license, data files under
test/dataof this repository are under the CC0 public waiver.Acknowledgement
George Bouras for providing some example becterial pod5 files. Rasmus Kirkegaard for this public zymo pod5 dataset. George from ONT for help in understanding pod5 stuff.
Citation