Add dominant-symbol sparse_num decode support (#800)
Summary: Pull Request resolved: https://github.com/facebook/openzl/pull/800
Add support for dominant-symbol
sparse_numencoding and decoding while preserving the existingsparse_numencoder node as the zero-dominant entry point.The wire format remains empty-header zero-dominant for the common case. A non-empty codec header directly stores the dominant symbol as the shortest little-endian byte sequence that represents it, with missing high bytes interpreted as zero and header size limited to the value stream width. The
valuesstream always contains literal values only; it is never prefixed by the dominant symbol.The public encoder surface now has two nodes.
ZL_NODE_SPARSE_NUM/!zl.sparse_num/nodes::SparseNumkeeps the previous sparse-zero meaning and forces dominant symbol0, so it skips auto-detection.ZL_NODE_SPARSE_NUM_AUTO/!zl.sparse_num_auto/nodes::SparseNumAutoauto-detects a dominant symbol from an input prefix unless one is explicitly provided viaZL_SPARSE_NUM_DOMINANT_VALUE_PID(132). Both nodes emit the sameZL_StandardTransformID_sparse_numtransform and share the same decoder/wire format.The decoder validates the compact header, passes a
uint64_tdominant value into the single decode kernel entry point, and preserves the zero-dominant D8V* fast path internally. UnitBench direct-kernel scenarios were updated for the unified decode signature.Zero-dominant encode performance was rechecked after keeping the common path specialized. Latest full
unitBenchrun:buck run fbcode//openzl/dev/benchmark/unitBench/scripts:sparse_num_bench -- --sample-mib 2 --duration-s 3.
width zeros elements raw encoded ratio encode MiB/s decode MiB/s u8 75% 2097100 2.00 MiB 1023.97 KiB 2.00 1118.3 1091.9 u8 90% 2097100 2.00 MiB 409.59 KiB 5.00 1247.2 2499.6 u8 98% 2097100 2.00 MiB 81.92 KiB 25.00 1320.2 12429.8 u16 75% 1048500 2.00 MiB 767.94 KiB 2.67 2222.1 2008.1 u16 90% 1048500 2.00 MiB 307.18 KiB 6.67 2397.0 5070.6 u16 98% 1048500 2.00 MiB 61.44 KiB 33.33 2564.9 23583.2 u32 75% 524200 2.00 MiB 639.89 KiB 3.20 4362.3 4284.7 u32 90% 524200 2.00 MiB 255.96 KiB 8.00 4970.6 9754.5 u32 98% 524200 2.00 MiB 51.19 KiB 40.00 5304.1 35965.2 u64 75% 262100 2.00 MiB 575.90 KiB 3.56 8462.4 7768.7 u64 90% 262100 2.00 MiB 230.36 KiB 8.89 9486.1 20851.6 u64 98% 262100 2.00 MiB 46.07 KiB 44.44 9840.9 48068.9 Reviewed By: terrelln
Differential Revision: D106528533
fbshipit-source-id: c27ecb2396e49e5a0cf89e1159d25839dbe7cef3
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
OpenZL
OpenZL delivers high compression ratios while preserving high speed, a level of performance that is out of reach for generic compressors. Check out the blog post and whitepaper for a breakdown of how it works.
OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format. Learn how it works →
OpenZL consists of a core library and tools to generate specialized compressors — all compatible with a single universal decompressor. It is designed for engineers that deal with large quantities of specialized datasets (like AI workloads for example) and require high speed for their processing pipelines.
See our docs for more information and our quickstart guide to get started with a guided tutorial.
Project Status
This project is under active development. The API, the compressed format, and the set of codecs and graphs included in OpenZL are all subject to (and will!) change as the project matures.
However, we intend to maintain some stability guarantees in the face of that evolution. In particular, payloads compressed with any release-tagged version of the library will remain decompressible by new releases of the library for at least the next several years. And new releases of the library will be able to generate frames compatible with at least the previous release.
(Commits on the
devbranch offer no guarantees whatsoever. Use only release-tagged commits for any non-experimental deployments.)Despite the big scary warnings above, we consider the core to have reached production-readiness, and OpenZL is used extensively in production at Meta.
Building OpenZL
Prerequisites
OpenZL requires a compiler that supports C11 and C++17. When building with
cmake,cmake 3.20.2or newer is required. There is ongoing work to relax these restrictions. As that happens, this section will be updated.Build with
makeThe OpenZL library and essential tools can be built using
make:Build Options
The
Makefilesupports all standard build variables, such asCC,CFLAGS,CPPFLAGS,LDFLAGS,LDLIBS, etc.It builds with multi-threading by default, auto-detecting the local number of cores, and can be overridden using standard
-j#flag (ex:make -j8).Build Types
Binary generation can be altered by explicitly requesting a build type:
Example:
Build types are documented in
make help, and their exact flags are detailed withmake show-config.Usual ones are:
BUILD_TYPE=DEV: debug build with asserts enabled and ASAN / UBSAN enabledBUILD_TYPE=OPT: optimized build with asserts disabled (default)Build with
cmakeOpenZL can be built using
cmake. Basic usage is as follows:Details on setting CMake variables is below.
Build Modes
By default, we ship several different predefined build modes which can be set with the
OPENZL_BUILD_MODEvariable:none(default): CMake default build mode controlled byCMAKE_BUILD_TYPEdev: debug build with asserts enabled and ASAN / UBSAN enableddev-nosan: debug build with asserts enabledopt: optimized build with asserts disabledopt-asan: optimized build with asserts disabled and ASAN / UBSAN enableddbgo: optimized build with asserts enableddbgo-asan: optimized build with asserts enabled and ASAN / UBSAN enabledFor ASAN / UBSAN, ensure that
libasanandlibubsanare installed on the machine.Editor Integration
OpenZL ships with settings to configure VSCode to work with the CMake build system. To enable it install two extensions:
cmake-toolsclangd(or any other C++ language server that works withcompile_commands.json)Important: For proper C++ language server support, you need to generate
compile_commands.json:The preferred method is to use the CMake Tools extension command “
CMake: Configure“.If it doesn’t work, or is too difficult to setup, you can use the manual setup:
When to regenerate:
CMakeLists.txtCMake Variables
CMAKE_C_COMPILER= Set the C compiler for OpenZL & dependency buildsCMAKE_CXX_COMPILER= Set the C++ compiler for OpenZL & dependency buildsCMAKE_C_FLAGS= C flags for OpenZL & dependency buildsCMAKE_CXX_FLAGS= C++ flags for OpenZL & dependency buildsOPENZL_BUILD_TESTS=ON= pull in testing deps and build the unit/integration testsOPENZL_BUILD_BENCHMARKS=ON= pull in benchmarking deps and build the benchmark executableOPENZL_BUILD_MODE= Sets the build mode for OpenZL and dependenciesOPENZL_SANITIZE_ADDRESS=ON= Enable ASAN & UBSAN for OpenZL (but not dependencies)OPENZL_COMMON_COMPILE_OPTIONS= Shared C/C++ compiler options for OpenZL onlyOPENZL_C_COMPILE_OPTIONS= C compiler options for OpenZL onlyOPENZL_CXX_COMPILE_OPTIONS= C++ compiler options for OpenZL onlyOPENZL_COMMON_COMPILE_DEFINITIONS= Shared C/C++ compiler definitions (-D) for OpenZL onlyOPENZL_C_COMPILE_DEFINITIONS= C compiler definitions (-D) for OpenZL onlyOPENZL_CXX_COMPILE_DEFINITIONS= C++ compiler definitions (-D) for OpenZL onlyOPENZL_COMMON_FLAGS= extra compiler flags used in all targetsWindows Build
OpenZL uses modern C11 features that may not be fully supported by MSVC. For Windows builds, we recommend using clang-cl for the best compatibility.
Quick Start (Windows)
Recommended: Use
clang-clfor full C11 supportAlternative: Use MinGW-w64 for GNU toolchain compatibility.
Limited Support: MSVC may produce C2099 errors due to limited C11 support.
Compiler Detection
Run our detection script to check available compilers and get recommendations:
For detailed Windows build instructions, troubleshooting, and installation guides, see build-scripts/cmake/WINDOWS_BUILD.md.
License
OpenZL is BSD licensed, as found in the LICENSE file.