auto-segment large numeric inputs (#601)
Summary: Pull Request resolved: https://github.com/facebook/openzl/pull/601
Add auto-segmentation to all numeric CLI profiles (le-u8/i8, le-u16/i16, le-u32/i32, le-u64/i64). Previously, these profiles compressed the entire input as a single monolithic block regardless of size.
The new serial-numeric segmenter:
- Accepts serial input (required since segmenters must be the top operation)
- Chunks by configurable byte size (default 16 MB), aligned to element width
- Forwards each chunk to the profile’s existing compression graph (interpretAsLE -> [zigzag] -> ACE(FieldLZ))
- Respects the –chunk-size-mb CLI flag
Additionally includes two ancillary fixes:
Fix
zs_fuzzersmapped_srcs broken target reference (defs.bzl)The
generic_lionhead_harnessinzs_fuzzersreferences the binary target produced bycpp_lionhead_harnessviamapped_srcs. The binary’s name suffix depends on which fuzzer backendcpp_lionhead_harnessselects internally. The selection is made byget_bundle_build_rule()inconfigs.bzl, which — whenlionhead.fuzzeris unset and AFL is not disabled — defaults to AFL (producing a_aflsuffix). Butzs_fuzzersusednative.read_config("lionhead", "fuzzer") or "libfuzzer", which treats unset as"libfuzzer"and produces a_binsuffix. The two defaults disagreed.History:
- D57974923 (eqv, May 2024): added
_binsuffix to the TulipV2 harness BUCK file to accommodate the lionhead generic bundle cutover, which started creating binaries withname + "_bin"vialibfuzzer_generic_harness.- D65686705 (terrelln, Jan 2025): generalized the generator concept into the
zs_fuzzersmacro, carrying the_binsuffix forward.- D94710058 (kevz8, Mar 2026): renamed harness targets (
name→name_NoGenerator), preserving the_binsuffix as_NoGenerator_bin.- D95816404 (kevz8, Mar 2026): added
native.read_configto switch between_afland_bin, fixing AFL mode but usingor "libfuzzer"as the default — which disagrees withget_bundle_build_rule(), which defaults to AFL when both are available.The bug was latent because
generic_lionhead_harnessbundles are only built when CI’s target determinator reaches them through the dependency graph. This diff modifieszl_graphs.h, a widely-included header, which triggers a full rebuild including the fuzz bundles — exposing the broken target reference.Fix: mirror the
get_bundle_build_rule()logic — default to AFL (_afl) unlesslionhead.fuzzeris explicitly"libfuzzer".Lint fixes
- Remove unused
#include "tests/utils.h"inSegmentNumFromSerial.cpp- Suppress
facebook-avoid-non-const-global-variablesong_runtimeParamSuccessorintest_segmenter.cpp(intentionally mutable: written at registration time before tests run)Reviewed By: terrelln
Differential Revision: D99758123
fbshipit-source-id: 7cfef7fb473659fba9a565c86d88379966d52ce5
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
OpenZL
OpenZL delivers high compression ratios while preserving high speed, a level of performance that is out of reach for generic compressors. Check out the blog post and whitepaper for a breakdown of how it works.
OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format. Learn how it works →
OpenZL consists of a core library and tools to generate specialized compressors — all compatible with a single universal decompressor. It is designed for engineers that deal with large quantities of specialized datasets (like AI workloads for example) and require high speed for their processing pipelines.
See our docs for more information and our quickstart guide to get started with a guided tutorial.
Project Status
This project is under active development. The API, the compressed format, and the set of codecs and graphs included in OpenZL are all subject to (and will!) change as the project matures.
However, we intend to maintain some stability guarantees in the face of that evolution. In particular, payloads compressed with any release-tagged version of the library will remain decompressible by new releases of the library for at least the next several years. And new releases of the library will be able to generate frames compatible with at least the previous release.
(Commits on the
devbranch offer no guarantees whatsoever. Use only release-tagged commits for any non-experimental deployments.)Despite the big scary warnings above, we consider the core to have reached production-readiness, and OpenZL is used extensively in production at Meta.
Building OpenZL
Prerequisites
OpenZL requires a compiler that supports C11 and C++17. When building with
cmake,cmake 3.20.2or newer is required. There is ongoing work to relax these restrictions. As that happens, this section will be updated.Build with
makeThe OpenZL library and essential tools can be built using
make:Build Options
The
Makefilesupports all standard build variables, such asCC,CFLAGS,CPPFLAGS,LDFLAGS,LDLIBS, etc.It builds with multi-threading by default, auto-detecting the local number of cores, and can be overridden using standard
-j#flag (ex:make -j8).Build Types
Binary generation can be altered by explicitly requesting a build type:
Example:
Build types are documented in
make help, and their exact flags are detailed withmake show-config.Usual ones are:
BUILD_TYPE=DEV: debug build with asserts enabled and ASAN / UBSAN enabledBUILD_TYPE=OPT: optimized build with asserts disabled (default)Build with
cmakeOpenZL can be built using
cmake. Basic usage is as follows:Details on setting CMake variables is below.
Build Modes
By default, we ship several different predefined build modes which can be set with the
OPENZL_BUILD_MODEvariable:none(default): CMake default build mode controlled byCMAKE_BUILD_TYPEdev: debug build with asserts enabled and ASAN / UBSAN enableddev-nosan: debug build with asserts enabledopt: optimized build with asserts disabledopt-asan: optimized build with asserts disabled and ASAN / UBSAN enableddbgo: optimized build with asserts enableddbgo-asan: optimized build with asserts enabled and ASAN / UBSAN enabledFor ASAN / UBSAN, ensure that
libasanandlibubsanare installed on the machine.Editor Integration
OpenZL ships with settings to configure VSCode to work with the CMake build system. To enable it install two extensions:
cmake-toolsclangd(or any other C++ language server that works withcompile_commands.json)Important: For proper C++ language server support, you need to generate
compile_commands.json:The preferred method is to use the CMake Tools extension command “
CMake: Configure“.If it doesn’t work, or is too difficult to setup, you can use the manual setup:
When to regenerate:
CMakeLists.txtCMake Variables
CMAKE_C_COMPILER= Set the C compiler for OpenZL & dependency buildsCMAKE_CXX_COMPILER= Set the C++ compiler for OpenZL & dependency buildsCMAKE_C_FLAGS= C flags for OpenZL & dependency buildsCMAKE_CXX_FLAGS= C++ flags for OpenZL & dependency buildsOPENZL_BUILD_TESTS=ON= pull in testing deps and build the unit/integration testsOPENZL_BUILD_BENCHMARKS=ON= pull in benchmarking deps and build the benchmark executableOPENZL_BUILD_MODE= Sets the build mode for OpenZL and dependenciesOPENZL_SANITIZE_ADDRESS=ON= Enable ASAN & UBSAN for OpenZL (but not dependencies)OPENZL_COMMON_COMPILE_OPTIONS= Shared C/C++ compiler options for OpenZL onlyOPENZL_C_COMPILE_OPTIONS= C compiler options for OpenZL onlyOPENZL_CXX_COMPILE_OPTIONS= C++ compiler options for OpenZL onlyOPENZL_COMMON_COMPILE_DEFINITIONS= Shared C/C++ compiler definitions (-D) for OpenZL onlyOPENZL_C_COMPILE_DEFINITIONS= C compiler definitions (-D) for OpenZL onlyOPENZL_CXX_COMPILE_DEFINITIONS= C++ compiler definitions (-D) for OpenZL onlyOPENZL_COMMON_FLAGS= extra compiler flags used in all targetsWindows Build
OpenZL uses modern C11 features that may not be fully supported by MSVC. For Windows builds, we recommend using clang-cl for the best compatibility.
Quick Start (Windows)
Recommended: Use
clang-clfor full C11 supportAlternative: Use MinGW-w64 for GNU toolchain compatibility.
Limited Support: MSVC may produce C2099 errors due to limited C11 support.
Compiler Detection
Run our detection script to check available compilers and get recommendations:
For detailed Windows build instructions, troubleshooting, and installation guides, see build-scripts/cmake/WINDOWS_BUILD.md.
License
OpenZL is BSD licensed, as found in the LICENSE file.