DSRC is a toolkit designed for efficient high-performance compression of sequencing reads stored in FASTQ format, where it’s main features are:
Effective multithreaded compression of FASTQ files.
Full support for Illumina, ABI SOLiD, and 454/Ion Torrent dataset formats with non-standard (AGCTN) IUPAC base values.
Support for lossy quality values compression using Illumina binning scheme.
Support for lossy IDs compression keeping only key fields selected by user.
Pipes support for easy integration with current pipelines.
Python and C++ libraries allowing to integrate DSRC archives in own applications.
Availability for Linux, Mac OSX and Windows 64-bit operating systems.
Open source C++ code under GNU GPL 2 license.
Building
Build prerequisites
Linux
DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different makefile file is provided. In the first case, boost::threads library will be used, which is needed to be present on the build system. In the second - g++ compiler with c++11 support (version >= 4.8).
By default, binaries and libraries are compiled using g++, however compiling using Clang or Intel icpc should also succeeed without any problems.
Mac OSX
On Mac OSX Clang compiler will be used with c++11 support, so make sure to have Clang in version >= 3.3 installed.
Windows
To compile DSRC under Windows OS, Microsoft Visual Studio 2010 or 2012 is required. DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different VS solution file is provided. When compiling using VS2010 the boost::threads library will be used to provide multithreading support, so make sure to have boost::threads library installed and boost library paths properly configured in Visual Studio. In case of using VS2012 c++11 standard implementation will be used to provide threading support.
There should be also no problems when compiling DSRC using MinGW-32-x64 with provided Makefile files.
Python library
To build DSRC Python library, boost::python library in development version and boost::build tool bjam are need to be present on the system. Next, in the Jamroot configuration file in py directory a local boost installation directory needs to be specified:
# To compile DSRC Python module please specify your boost installation directory below
#
use-project boost
: /absolute/path/to/boost/directory/ ;
Python library will be built using a default compilation toolset available on the build platform (auto selected by bjam), however in order to specify a different one append
<toolset>name
to the compilation flags as exmplained in the Jamroot file
# Specify toolset according to your platform manually in case of compilation problems in form: '<toolset>gcc'
# Available toolsets:
# - Windows: msvc-*
# - Linux: gcc, clang
# - Mac OSX: darwin, gcc
: <variant>release <address-model>64 <link>shared <runtime-link>shared <debug-symbols>off <inlining>full <optimization>speed <warnings>on <cxxflags>"-O2 -m64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DUSE_BOOST_THREAD" ;
Building on Linux
Binary
To compile DSRC using boost::threads with static linking, in the main directory type:
make bin
To compile DSRC using g++ >= 4.8 with c++11 standard and dynamic linking:
make -f Makefile.c++11 bin
The resulting dsrc binary will be placed in bin subdirectory.
C++ library
To compile C++ DSRC library using boost::threads:
make lib
To compile DSRC using g++ >= 4.8 with c++11:
make -f Makefile.c++11 lib
The resulting libdsrc.a library will be placed in lib subdirectory.
Python library
To compile DSRC Python library:
make pylib
The resulting pydsrc.so library will be available in py subdirectory.
Building on Mac OSX
Binary
To compile DSRC binary, in the main directory type:
make -f Makefile.osx bin
The resulting dsrc binary will be placed in bin subdirectory.
C++ library
To compile DSRC C++ library:
make -f Makefile.osx lib
The resulting libdsrc.a library will be placed in lib subdirectory.
Python library
To compile DSRC Python library:
make -f Makefile.osx pylib
The resulting pydsrc.so library will be available in py subdirectory.
Building on Windows 64-bit
Binary
To compile DSRC using Visual Studio 2010 with boost::threads for multithreading support use the dsrc20-vs2k10.sln solution file. However, to compile DSRC using Visual Studio 2012 with c++11 threads use the dsrc20-vs2k12.sln.
To compile DSRC executable, select Release|x64 configuration and build.
The resulting dsrc.exe executable will be placed in bin subdirectory.
C++ library
To compile DSRC library, select Release Lib|x64 configuration and build.
The resulting dsrc.lib library will be placed in lib subdirectory.
Python library
To compile DSRC Python library in the py subdirectory type:
bjam
The resulting pydsrc.pyd library will be available in py subdirectory.
DSRC
DSRC is a toolkit designed for efficient high-performance compression of sequencing reads stored in FASTQ format, where it’s main features are:
Building
Build prerequisites
Linux
DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different makefile file is provided. In the first case, boost::threads library will be used, which is needed to be present on the build system. In the second - g++ compiler with c++11 support (version >= 4.8).
By default, binaries and libraries are compiled using g++, however compiling using Clang or Intel icpc should also succeeed without any problems.
Mac OSX
On Mac OSX Clang compiler will be used with c++11 support, so make sure to have Clang in version >= 3.3 installed.
Windows
To compile DSRC under Windows OS, Microsoft Visual Studio 2010 or 2012 is required. DSRC binaries and C++ library can be compiled in two ways, depending on the selection of multithreading support library - for each a different VS solution file is provided. When compiling using VS2010 the boost::threads library will be used to provide multithreading support, so make sure to have boost::threads library installed and boost library paths properly configured in Visual Studio. In case of using VS2012 c++11 standard implementation will be used to provide threading support.
There should be also no problems when compiling DSRC using MinGW-32-x64 with provided Makefile files.
Python library
To build DSRC Python library, boost::python library in development version and boost::build tool bjam are need to be present on the system. Next, in the Jamroot configuration file in py directory a local boost installation directory needs to be specified:
Python library will be built using a default compilation toolset available on the build platform (auto selected by bjam), however in order to specify a different one append
to the compilation flags as exmplained in the Jamroot file
Building on Linux
Binary
To compile DSRC using boost::threads with static linking, in the main directory type:
To compile DSRC using g++ >= 4.8 with c++11 standard and dynamic linking:
The resulting dsrc binary will be placed in bin subdirectory.
C++ library
To compile C++ DSRC library using boost::threads:
To compile DSRC using g++ >= 4.8 with c++11:
The resulting libdsrc.a library will be placed in lib subdirectory.
Python library
To compile DSRC Python library:
The resulting pydsrc.so library will be available in py subdirectory.
Building on Mac OSX
Binary
To compile DSRC binary, in the main directory type:
The resulting dsrc binary will be placed in bin subdirectory.
C++ library
To compile DSRC C++ library:
The resulting libdsrc.a library will be placed in lib subdirectory.
Python library
To compile DSRC Python library:
The resulting pydsrc.so library will be available in py subdirectory.
Building on Windows 64-bit
Binary
To compile DSRC using Visual Studio 2010 with boost::threads for multithreading support use the dsrc20-vs2k10.sln solution file. However, to compile DSRC using Visual Studio 2012 with c++11 threads use the dsrc20-vs2k12.sln.
To compile DSRC executable, select
Release|x64configuration and build.The resulting dsrc.exe executable will be placed in bin subdirectory.
C++ library
To compile DSRC library, select
Release Lib|x64configuration and build.The resulting dsrc.lib library will be placed in lib subdirectory.
Python library
To compile DSRC Python library in the py subdirectory type:
The resulting pydsrc.pyd library will be available in py subdirectory.
Usage
DSRC can be run from the command prompt:
in one of two modes:
c— compression,d— decompression.Available options
Compression options
-d<n>— DNA compression mode:0–3, default:0-q<n>— Quality compression mode:0–2, default:0-f<1,...>— keep only those fields no. in ID field string, default:(keep all)-b<n>— FASTQ input buffer size in MB, default:8-m<n>— Automated compression mode (one of the three preset combination of other pa- rameters):0–2-o<n>— Quality offset, 0 for auto selection, default:0-l— use Quality lossy mode (Illumina binning scheme), default:false-c— calculate and check CRC32 checksum calculation per block (slows the compression about twice), default:falseAutomated compression modes
-m0— fast mode, equivalent to:-d0 -q0 -b8-m1— medium mode, equivalent to:-d2 -q2 -b64-m2— best mode, equivalent to:-d3 -q2 -b256Options for both compression and decompression
-t<n>— processing threads number, default: max available hardware threads-s— use stdin/stdout for reading/writing raw FASTQ files data (stderr is used for info/warning messages)Usage examples
Compress
SRR001471.fastqfile saving DSRC archive toSRR001471.dsrc:Compress file in the fast mode with CRC32 checking and using
4threads:Compress file using DNA and Quality compression level
2and using512MB buffer:Compress file in the best mode with lossy Quality mode and preserving only
1–4fields from record IDs:Compress in the best mode reading raw FASTQ data from stdin:
Decompress
SRR001471.dsrcarchive saving output FASTQ file toSRR001471.out.fastq:Decompress archive using
4threads and streaming raw FASTQ data to stdout:Citing
Roguski, L., Deorowicz, S. (2014) DSRC 2: Industry-oriented compression of FASTQ files, Bioinformatics, 30(15):2213–2215.
Deorowicz, S., Grabowski, Sz. (2011) Compression of DNA sequences in FASTQ format, Bioinformatics, 27(6):860–862.