The Succinct Data Structure Library (SDSL) is a powerful and flexible C++11
library implementing succinct data structures. In total, the library contains
the highlights of 40 research publications. Succinct data structures
can represent an object (such as a bitvector or a tree) in space close to the
information-theoretic lower bound of the object while supporting operations
of the original object efficiently. The theoretical time complexity of an
operation performed on the classical data structure and the equivalent
succinct data structure are (most of the time) identical.
Why SDSL?
Succinct data structures have very attractive theoretical properties. However,
in practice implementing succinct data structures is non-trivial as they are
often composed of complex operations on bitvectors. The SDSL Library provides
high quality, open source implementations of many succinct data structures
proposed in literature.
Specifically, the aim of the library is to provide basic and complex succinct
data structure which are
Easy and intuitive to use (like the STL, which provides classical data structures),
Faithful to the original theoretical results,
Capable of handling large inputs (yes, we support 64-bit),
Provide efficient construction of all implemented succinct data structures,
while at the same time enable good run-time performance.
In addition we provide additional functionality which can help you use succinct
data structure to their full potential.
Each data structure can easily be serialized and loaded to/from disk.
We provide functionality which helps you analyze the storage requirements of any
SDSL based data structure (see right)
We support features such as hugepages and tracking the memory usage of each
SDSL data structure.
Complex structures can be configured by template parameters and therefore
easily be composed. There exists one simple method which constructs
all complex structures.
We maintain an extensive collection of examples which help you use the different
features provided by the library.
All data structures are tested for correctness using a unit-testing framework.
A 64-bit operating system. Either Mac OS X or Linux are currently supported.
For increased performance the processor of the system should support fast bit operations available in SSE4.2
Installation
To download and install the library use the following commands.
git clone https://github.com/simongog/sdsl-lite.git
cd sdsl-lite
./install.sh
This installs the sdsl library into the include and lib directories in your
home directory. A different location prefix can be specified as a parameter of
the install.sh script:
./install.sh /usr/local/
To build a portable sdsl library without using SSE4.2 and AVX2 instructions, set BUILD_PORTABLE at build time, e.g. BUILD_PORTABLE=1 ./install.sh or mkdir build && BUILD_PORTABLE=1 cmake ...
These instructions are enabled by default if the processor of the build system supports them.
To remove the library from your system use the provided uninstall script:
To get you started with the library you can start by compiling the following
sample program which constructs a compressed suffix array (a FM-Index) over the
text mississippi!, counts the number of occurrences of pattern si and
stores the data structure, and a space usage visualization to the
files fm_index-file.sdsl and fm_index-file.sdsl.html:
Next we suggest you look at the comprehensive tutorial which describes
all major features of the library or look at some of the provided examples.
Test
Implementing succinct data structures can be tricky. To ensure that all data
structures behave as expected, we created a large collection of unit tests
which can be used to check the correctness of the library on your computer.
The test directory contains test code. We use googletest
framework and make to run the tests. See the README file in the
directory for details.
To simply run all unit tests after installing the library type
cd sdsl-lite/build
make test-sdsl
Note: Running the tests requires several sample files to be downloaded from the web
and can take up to 2 hours on slow machines.
Benchmarks
To ensure the library runs efficiently on your system we suggest you run our
benchmark suite. The benchmark suite recreates a
popular experimental study which you can
directly compare to the results of your benchmark run.
Bug Reporting
While we use an extensive set of unit tests and test coverage tools you might
still find bugs in the library. We encourage you to report any problems with
the library via the github issue tracking system
of the project.
If you are running experiments in an academic settings we suggest you use the
most recent released version
of the library. This allows others to reproduce your experiments exactly.
Licensing
The SDSL library is free software provided under the GNU General Public License
(GPLv3). For more information see the COPYING file in the library
directory.
We distribute this library freely to foster the use and development of advanced
data structure. If you use the library in an academic setting please cite the
following paper:
@inproceedings{gbmp2014sea,
title = {From Theory to Practice: Plug and Play with Succinct Data Structures},
author = {Gog, Simon and Beller, Timo and Moffat, Alistair and Petri, Matthias},
booktitle = {13th International Symposium on Experimental Algorithms, (SEA 2014)},
year = {2014},
pages = {326-337},
ee = {http://dx.doi.org/10.1007/978-3-319-07959-2_28}
}
This project is also supported by code contributions
from other researchers. E.g. Juha Kärkkäinen,
Dominik Kempa,
and Simon Puglisi contributed a compressed bitvector
implementation (hyb_vector).
This project further profited from excellent input of our students
Markus Brenner, Alexander Diehm, Christian Ocker, and Maike Zwerger. Stefan
Arnold helped us with tricky template questions. We are also grateful to
Diego Caro,
Travis Gagie,
Kalle Karhu,
Bruce Kuo,
Jan Kurrus,
Shanika Kuruppu,
Jouni Siren,
and Julio Vizcaino
for bug reports.
Contribute
Are you working on a new or improved implementation of a succinct data structure?
We encourage you to contribute your implementation to the SDSL library to make
your work accessible to the community within the existing library framework.
Feel free to contact any of the authors or create an issue on the
issue tracking system.
关于
该链接实际对应 sdsl-lite,一个实现 succinct data structures 的通用 C++11 算法库。
SDSL - Succinct Data Structure Library
What is it?
The Succinct Data Structure Library (SDSL) is a powerful and flexible C++11 library implementing succinct data structures. In total, the library contains the highlights of 40 research publications. Succinct data structures can represent an object (such as a bitvector or a tree) in space close to the information-theoretic lower bound of the object while supporting operations of the original object efficiently. The theoretical time complexity of an operation performed on the classical data structure and the equivalent succinct data structure are (most of the time) identical.
Why SDSL?
Succinct data structures have very attractive theoretical properties. However, in practice implementing succinct data structures is non-trivial as they are often composed of complex operations on bitvectors. The SDSL Library provides high quality, open source implementations of many succinct data structures proposed in literature.
Specifically, the aim of the library is to provide basic and complex succinct data structure which are
In addition we provide additional functionality which can help you use succinct data structure to their full potential.
The library contains many succinct data structures from the following categories:
For a complete overview including theoretical bounds see the cheat sheet or the wiki.
Documentation
We provide an extensive set of documentation describing all data structures and features provided by the library. Specifically we provide
Requirements
The SDSL library requires:
g++version 4.9 or higher orclangversion 3.2 or higher.SSE4.2Installation
To download and install the library use the following commands.
This installs the sdsl library into the
includeandlibdirectories in your home directory. A different location prefix can be specified as a parameter of theinstall.shscript:To build a portable sdsl library without using
SSE4.2andAVX2instructions, setBUILD_PORTABLEat build time, e.g.BUILD_PORTABLE=1 ./install.shormkdir build && BUILD_PORTABLE=1 cmake ... These instructions are enabled by default if the processor of the build system supports them.To remove the library from your system use the provided uninstall script:
There is also a Gentoo Ebuild for SDSL by Mathias Weller.
Getting Started
To get you started with the library you can start by compiling the following sample program which constructs a compressed suffix array (a FM-Index) over the text
mississippi!, counts the number of occurrences of patternsiand stores the data structure, and a space usage visualization to the filesfm_index-file.sdslandfm_index-file.sdsl.html:To compile the program using
g++run:Next we suggest you look at the comprehensive tutorial which describes all major features of the library or look at some of the provided examples.
Test
Implementing succinct data structures can be tricky. To ensure that all data structures behave as expected, we created a large collection of unit tests which can be used to check the correctness of the library on your computer. The test directory contains test code. We use googletest framework and make to run the tests. See the README file in the directory for details.
To simply run all unit tests after installing the library type
Note: Running the tests requires several sample files to be downloaded from the web and can take up to 2 hours on slow machines.
Benchmarks
To ensure the library runs efficiently on your system we suggest you run our benchmark suite. The benchmark suite recreates a popular experimental study which you can directly compare to the results of your benchmark run.
Bug Reporting
While we use an extensive set of unit tests and test coverage tools you might still find bugs in the library. We encourage you to report any problems with the library via the github issue tracking system of the project.
The Latest Version
The latest version can be found on the SDSL github project page https://github.com/simongog/sdsl-lite .
If you are running experiments in an academic settings we suggest you use the most recent released version of the library. This allows others to reproduce your experiments exactly.
Licensing
The SDSL library is free software provided under the GNU General Public License (GPLv3). For more information see the COPYING file in the library directory.
We distribute this library freely to foster the use and development of advanced data structure. If you use the library in an academic setting please cite the following paper:
A preliminary version is available here on arxiv.
External Resources used in SDSL
We have included the code of two excellent suffix array construction algorithms.
Additionally, we use the googletest framework to provide unit tests. Our visualizations are implemented using the d3js-library.
Authors
The main contributors to the library are:
This project is also supported by code contributions from other researchers. E.g. Juha Kärkkäinen, Dominik Kempa, and Simon Puglisi contributed a compressed bitvector implementation (hyb_vector). This project further profited from excellent input of our students Markus Brenner, Alexander Diehm, Christian Ocker, and Maike Zwerger. Stefan Arnold helped us with tricky template questions. We are also grateful to Diego Caro, Travis Gagie, Kalle Karhu, Bruce Kuo, Jan Kurrus, Shanika Kuruppu, Jouni Siren, and Julio Vizcaino for bug reports.
Contribute
Are you working on a new or improved implementation of a succinct data structure? We encourage you to contribute your implementation to the SDSL library to make your work accessible to the community within the existing library framework. Feel free to contact any of the authors or create an issue on the issue tracking system.