目录
Yann Collet

Add dominant-symbol sparse_num decode support (#800)

Summary: Pull Request resolved: https://github.com/facebook/openzl/pull/800

Add support for dominant-symbol sparse_num encoding and decoding while preserving the existing sparse_num encoder node as the zero-dominant entry point.

The wire format remains empty-header zero-dominant for the common case. A non-empty codec header directly stores the dominant symbol as the shortest little-endian byte sequence that represents it, with missing high bytes interpreted as zero and header size limited to the value stream width. The values stream always contains literal values only; it is never prefixed by the dominant symbol.

The public encoder surface now has two nodes. ZL_NODE_SPARSE_NUM / !zl.sparse_num / nodes::SparseNum keeps the previous sparse-zero meaning and forces dominant symbol 0, so it skips auto-detection. ZL_NODE_SPARSE_NUM_AUTO / !zl.sparse_num_auto / nodes::SparseNumAuto auto-detects a dominant symbol from an input prefix unless one is explicitly provided via ZL_SPARSE_NUM_DOMINANT_VALUE_PID (132). Both nodes emit the same ZL_StandardTransformID_sparse_num transform and share the same decoder/wire format.

The decoder validates the compact header, passes a uint64_t dominant value into the single decode kernel entry point, and preserves the zero-dominant D8V* fast path internally. UnitBench direct-kernel scenarios were updated for the unified decode signature.

Zero-dominant encode performance was rechecked after keeping the common path specialized. Latest full unitBench run: buck run fbcode//openzl/dev/benchmark/unitBench/scripts:sparse_num_bench -- --sample-mib 2 --duration-s 3.

width zeros elements raw encoded ratio encode MiB/s decode MiB/s
u8 75% 2097100 2.00 MiB 1023.97 KiB 2.00 1118.3 1091.9
u8 90% 2097100 2.00 MiB 409.59 KiB 5.00 1247.2 2499.6
u8 98% 2097100 2.00 MiB 81.92 KiB 25.00 1320.2 12429.8
u16 75% 1048500 2.00 MiB 767.94 KiB 2.67 2222.1 2008.1
u16 90% 1048500 2.00 MiB 307.18 KiB 6.67 2397.0 5070.6
u16 98% 1048500 2.00 MiB 61.44 KiB 33.33 2564.9 23583.2
u32 75% 524200 2.00 MiB 639.89 KiB 3.20 4362.3 4284.7
u32 90% 524200 2.00 MiB 255.96 KiB 8.00 4970.6 9754.5
u32 98% 524200 2.00 MiB 51.19 KiB 40.00 5304.1 35965.2
u64 75% 262100 2.00 MiB 575.90 KiB 3.56 8462.4 7768.7
u64 90% 262100 2.00 MiB 230.36 KiB 8.89 9486.1 20851.6
u64 98% 262100 2.00 MiB 46.07 KiB 44.44 9840.9 48068.9

Reviewed By: terrelln

Differential Revision: D106528533

fbshipit-source-id: c27ecb2396e49e5a0cf89e1159d25839dbe7cef3

21小时前645次提交

OpenZL

OpenZL delivers high compression ratios while preserving high speed, a level of performance that is out of reach for generic compressors. Check out the blog post and whitepaper for a breakdown of how it works.

OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format. Learn how it works →

OpenZL consists of a core library and tools to generate specialized compressors — all compatible with a single universal decompressor. It is designed for engineers that deal with large quantities of specialized datasets (like AI workloads for example) and require high speed for their processing pipelines.

See our docs for more information and our quickstart guide to get started with a guided tutorial.

Project Status

This project is under active development. The API, the compressed format, and the set of codecs and graphs included in OpenZL are all subject to (and will!) change as the project matures.

However, we intend to maintain some stability guarantees in the face of that evolution. In particular, payloads compressed with any release-tagged version of the library will remain decompressible by new releases of the library for at least the next several years. And new releases of the library will be able to generate frames compatible with at least the previous release.

(Commits on the dev branch offer no guarantees whatsoever. Use only release-tagged commits for any non-experimental deployments.)

Despite the big scary warnings above, we consider the core to have reached production-readiness, and OpenZL is used extensively in production at Meta.

Building OpenZL

Prerequisites

OpenZL requires a compiler that supports C11 and C++17. When building with cmake, cmake 3.20.2 or newer is required. There is ongoing work to relax these restrictions. As that happens, this section will be updated.

Build with make

The OpenZL library and essential tools can be built using make:

make

Build Options

The Makefile supports all standard build variables, such as CC, CFLAGS, CPPFLAGS, LDFLAGS, LDLIBS, etc.

It builds with multi-threading by default, auto-detecting the local number of cores, and can be overridden using standard -j# flag (ex: make -j8).

Build Types

Binary generation can be altered by explicitly requesting a build type:

Example:

make lib BUILD_TYPE=DEV

Build types are documented in make help, and their exact flags are detailed with make show-config.

Usual ones are:

  • BUILD_TYPE=DEV: debug build with asserts enabled and ASAN / UBSAN enabled
  • BUILD_TYPE=OPT: optimized build with asserts disabled (default)

Build with cmake

OpenZL can be built using cmake. Basic usage is as follows:

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release -DOPENZL_BUILD_TESTS=ON ..
make -j
make -j test

Details on setting CMake variables is below.

Build Modes

By default, we ship several different predefined build modes which can be set with the OPENZL_BUILD_MODE variable:

  • none (default): CMake default build mode controlled by CMAKE_BUILD_TYPE
  • dev: debug build with asserts enabled and ASAN / UBSAN enabled
  • dev-nosan: debug build with asserts enabled
  • opt: optimized build with asserts disabled
  • opt-asan: optimized build with asserts disabled and ASAN / UBSAN enabled
  • dbgo: optimized build with asserts enabled
  • dbgo-asan: optimized build with asserts enabled and ASAN / UBSAN enabled

[!CAUTION] When switching between build modes, make sure to purge the CMake cache and re-configure the build. For instance, cmake --fresh -DOPENZL_BUILD_MODE=dev-nosan ..

For ASAN / UBSAN, ensure that libasan and libubsan are installed on the machine.

Editor Integration

OpenZL ships with settings to configure VSCode to work with the CMake build system. To enable it install two extensions:

  1. cmake-tools
  2. clangd (or any other C++ language server that works with compile_commands.json)

Important: For proper C++ language server support, you need to generate compile_commands.json:

The preferred method is to use the CMake Tools extension command “CMake: Configure“.

If it doesn’t work, or is too difficult to setup, you can use the manual setup:

mkdir -p cmakebuild
cmake -B cmakebuild -DOPENZL_BUILD_TESTS=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
cp cmakebuild/compile_commands.json .

When to regenerate:

  • After cloning the repository (first-time setup)
  • When adding/removing source files
  • When modifying CMakeLists.txt

CMake Variables

  • CMAKE_C_COMPILER = Set the C compiler for OpenZL & dependency builds
  • CMAKE_CXX_COMPILER = Set the C++ compiler for OpenZL & dependency builds
  • CMAKE_C_FLAGS = C flags for OpenZL & dependency builds
  • CMAKE_CXX_FLAGS = C++ flags for OpenZL & dependency builds
  • OPENZL_BUILD_TESTS=ON = pull in testing deps and build the unit/integration tests
  • OPENZL_BUILD_BENCHMARKS=ON = pull in benchmarking deps and build the benchmark executable
  • OPENZL_BUILD_MODE = Sets the build mode for OpenZL and dependencies
  • OPENZL_SANITIZE_ADDRESS=ON = Enable ASAN & UBSAN for OpenZL (but not dependencies)
  • OPENZL_COMMON_COMPILE_OPTIONS = Shared C/C++ compiler options for OpenZL only
  • OPENZL_C_COMPILE_OPTIONS = C compiler options for OpenZL only
  • OPENZL_CXX_COMPILE_OPTIONS = C++ compiler options for OpenZL only
  • OPENZL_COMMON_COMPILE_DEFINITIONS = Shared C/C++ compiler definitions (-D) for OpenZL only
  • OPENZL_C_COMPILE_DEFINITIONS = C compiler definitions (-D) for OpenZL only
  • OPENZL_CXX_COMPILE_DEFINITIONS = C++ compiler definitions (-D) for OpenZL only
  • OPENZL_COMMON_FLAGS = extra compiler flags used in all targets

Windows Build

OpenZL uses modern C11 features that may not be fully supported by MSVC. For Windows builds, we recommend using clang-cl for the best compatibility.

Quick Start (Windows)

  1. Recommended: Use clang-cl for full C11 support

    cmake -S . -B build -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl
    cmake --build build --config Release
  2. Alternative: Use MinGW-w64 for GNU toolchain compatibility.

    cmake -S . -B build -G "MinGW Makefiles"
    cmake --build build --config Release
  3. Limited Support: MSVC may produce C2099 errors due to limited C11 support.

Compiler Detection

Run our detection script to check available compilers and get recommendations:

# PowerShell
./build-scripts/cmake/detect_windows_compiler.ps1

# Command Prompt
./build-scripts/cmake/detect_windows_compiler.bat

For detailed Windows build instructions, troubleshooting, and installation guides, see build-scripts/cmake/WINDOWS_BUILD.md.

License

OpenZL is BSD licensed, as found in the LICENSE file.

关于
35.0 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号