Paimon C++ is a high-performance C++ implementation of Apache Paimon. Paimon C++ aims to provide a native, high-performance and extensible implementation that allows native engines to access the Paimon datalake format with maximum efficiency.
What’s in the Paimon C++ library
Write: append table and primary key table write support with compaction.
Commit: append table commit support for simple append-only tables.
Scan: batch and stream scan for append tables and primary key tables without changelog.
Read: append table read, primary key table read with deletion vector, and primary key table
merge-on-read.
File systems: file system abstraction with built-in local and Jindo file system support.
File formats: file format abstraction with built-in ORC, Parquet, and Avro support.
Runtime utilities: memory pool and thread pool abstractions with default implementations.
AI-Oriented Features: supports RowTracking and DataEvolution mode and provides Global Index capabilities including bitmap index, B-tree index, DiskANN-based vector search with Lumina, and Lucene-based full-text search.
Compatibility: compatibility with Apache Paimon Java format and communication protocols,
including commit messages, data splits, and manifests.
Note: Linux x86_64 and macOS arm64 builds are currently verified.
Write And Commit Example
The writing is divided into two stages:
Write records: write records in distributed tasks, generate commit messages.
Commit/Abort: collect all commit messages, commit them in a global node (‘Coordinator’, or named ‘Driver’, or named ‘Committer’). When the commit fails for certain reason, abort unsuccessful commit via commit messages.
Paimon C++ can either build selected third-party dependencies from bundled
sources or use libraries that are already installed on the system. The default
mode is AUTO, which tries system packages first and falls back to bundled
sources when they are not found.
$ cmake -B build -DPAIMON_DEPENDENCY_SOURCE=AUTO
The supported dependency source values are:
AUTO: use a system package when available, otherwise build bundled sources.
BUNDLED: always build bundled sources.
SYSTEM: require system packages and fail if they are not found.
You can also override individual dependencies. The supported dependency set
includes Arrow/Parquet, ORC, Protobuf, Avro, RE2, fmt, RapidJSON, TBB, glog,
GoogleTest, and compression libraries. Arrow and ORC require project-specific
patches, so their supported source values are AUTO and BUNDLED; AUTO
resolves to bundled sources for them.
Package-manager-specific modes are intentionally out of scope for this first
dependency source interface. They can still be used through standard CMake
mechanisms such as CMAKE_PREFIX_PATH or CMAKE_TOOLCHAIN_FILE, while Paimon
keeps the dependency source values limited to AUTO, BUNDLED, and SYSTEM.
When Arrow_SOURCE is explicitly set to BUNDLED or left as AUTO, the
compression dependencies default to bundled sources unless individually
overridden. Mixing system and bundled copies of transitive dependencies can
cause ABI conflicts, so prefer keeping Arrow and its compression dependencies
from the same source unless you have a specific reason to override them.
When ORC_SOURCE is explicitly set to BUNDLED or left as AUTO,
Protobuf_SOURCE defaults to bundled sources unless individually overridden.
CMake prints a dependency resolution summary during configuration showing the
requested source, actual source, compatibility target, and search root for each
resolved dependency.
Contributing
Paimon-cpp is an active open-source project and we welcome people who want to contribute or share good ideas!
Before contributing, please read the Contributing Guide and the Code Style Guide. You are encouraged to check out our documentation.
If you have suggestions, feedback, want to report a bug or request a feature, please open an issue.
Pull requests are also very welcome!
We value respectful and open collaboration, and appreciate everyone who helps make paimon-cpp better. Thank you for your support!
Linting
Install the python package pre-commit and run once pre-commit install.
pip install pre-commit
pre-commit install
This will setup a git pre-commit-hook that is executed on each commit and will report the linting problems. To run all hooks on all files use pre-commit run -a.
Dev Containers
We provide Dev Container configuration file templates.
To use a Dev Container as your development environment, follow the steps below, then select Dev Containers: Reopen in Container from VS Code’s Command Palette.
cd .devcontainer
cp Dockerfile.template Dockerfile
cp devcontainer.json.template devcontainer.json
If you make improvements that could benefit all developers, please update the template files and submit a pull request.
This project is maintained by a core team from the Storage Service team at Alibaba, including lxy-9602 (maintainer), lucasfang, lszskye, and zjw1111. We sincerely appreciate contributions from the community — your feedback and patches are welcome and highly valued. For any questions, feature proposals, or code reviews, please feel free to reach out to us directly.
Paimon C++
Paimon C++ is a high-performance C++ implementation of Apache Paimon. Paimon C++ aims to provide a native, high-performance and extensible implementation that allows native engines to access the Paimon datalake format with maximum efficiency.
What’s in the Paimon C++ library
Note: Linux x86_64 and macOS arm64 builds are currently verified.
Write And Commit Example
The writing is divided into two stages:
Scan and Read Example
The reading is divided into two stages:
Getting Started
Development
Clone the Repository
If you don’t have
git-lfsinstalled, please install it first.CMake
Third-party dependencies
Paimon C++ can either build selected third-party dependencies from bundled sources or use libraries that are already installed on the system. The default mode is
AUTO, which tries system packages first and falls back to bundled sources when they are not found.The supported dependency source values are:
AUTO: use a system package when available, otherwise build bundled sources.BUNDLED: always build bundled sources.SYSTEM: require system packages and fail if they are not found.You can also override individual dependencies. The supported dependency set includes Arrow/Parquet, ORC, Protobuf, Avro, RE2, fmt, RapidJSON, TBB, glog, GoogleTest, and compression libraries. Arrow and ORC require project-specific patches, so their supported source values are
AUTOandBUNDLED;AUTOresolves to bundled sources for them.Use
PAIMON_PACKAGE_PREFIXto provide one common prefix for dependencies whose own<Package>_ROOTvariable is not set.Package-manager-specific modes are intentionally out of scope for this first dependency source interface. They can still be used through standard CMake mechanisms such as
CMAKE_PREFIX_PATHorCMAKE_TOOLCHAIN_FILE, while Paimon keeps the dependency source values limited toAUTO,BUNDLED, andSYSTEM.When
Arrow_SOURCEis explicitly set toBUNDLEDor left asAUTO, the compression dependencies default to bundled sources unless individually overridden. Mixing system and bundled copies of transitive dependencies can cause ABI conflicts, so prefer keeping Arrow and its compression dependencies from the same source unless you have a specific reason to override them.When
ORC_SOURCEis explicitly set toBUNDLEDor left asAUTO,Protobuf_SOURCEdefaults to bundled sources unless individually overridden.CMake prints a dependency resolution summary during configuration showing the requested source, actual source, compatibility target, and search root for each resolved dependency.
Contributing
Paimon-cpp is an active open-source project and we welcome people who want to contribute or share good ideas! Before contributing, please read the Contributing Guide and the Code Style Guide. You are encouraged to check out our documentation.
If you have suggestions, feedback, want to report a bug or request a feature, please open an issue. Pull requests are also very welcome!
We value respectful and open collaboration, and appreciate everyone who helps make paimon-cpp better. Thank you for your support!
Linting
Install the python package
pre-commitand run oncepre-commit install.This will setup a git pre-commit-hook that is executed on each commit and will report the linting problems. To run all hooks on all files use
pre-commit run -a.Dev Containers
We provide Dev Container configuration file templates.
To use a Dev Container as your development environment, follow the steps below, then select
Dev Containers: Reopen in Containerfrom VS Code’s Command Palette.If you make improvements that could benefit all developers, please update the template files and submit a pull request.
License
Licensed under the Apache License, Version 2.0
Maintainership and Contributions
This project is maintained by a core team from the Storage Service team at Alibaba, including lxy-9602 (maintainer), lucasfang, lszskye, and zjw1111. We sincerely appreciate contributions from the community — your feedback and patches are welcome and highly valued. For any questions, feature proposals, or code reviews, please feel free to reach out to us directly.