Native Rust implementation of Apache Arrow and Apache Parquet
Welcome to the Rust implementation of Apache Arrow, a popular
in-memory columnar format and Apache Parquet, a popular columnar file
format.
Community
We welcome participation from everyone and encourage you to join us, ask
questions, help others, and get involved. All participation in the Apache Arrow
project is governed by the Apache Software Foundation’s code of
conduct.
We use GitHub issues and pull requests for all technical discussions, reviews,
new features, bug fixes and release coordination. This ensures that all communication
is public and archived for future reference.
The dev@arrow.apache.org mailing list is the communication channel for the overall Apache Arrow community.
Instructions for signing up and links to the archives can be found on the Arrow Community page.
Some community members also use the Arrow Rust Discord Server and the official ASF Slack server for informal discussions and coordination.
This is a great place to meet other contributors and get guidance on where to contribute.
However, all technical designs should also be recorded and formalized in GitHub issues, so that they are accessible to everyone.
In Slack, find us in the #arrow-rust channel and feel free to ask for an invite via Discord, GitHub issues, or other means.
There is more information in the contributing guide.
The Arrow Rust project releases approximately monthly and follows Semantic
Versioning.
Due to available maintainer and testing bandwidth, arrow crates (arrow,
arrow-flight, etc.) are released on the same schedule with the same versions
as the parquet and parquet_derive crates.
This crate releases every month. We release new major versions (with potentially
breaking API changes) at most once a quarter, and release incremental minor
versions in the intervening months. See ticket #5368 for more details.
To keep our maintenance burden down, we do regularly scheduled releases (major
and minor) from the main branch. How we handle PRs with breaking API changes
is described in the contributing guide.
arrow-rs and parquet are built and tested with stable Rust, and will keep a rolling MSRV (minimum supported Rust version) that can only be updated in major releases on an as needed basis (e.g. project dependencies bump their MSRV or a particular Rust feature is useful for us etc.). The new MSRV if selected will be at least 6 months old. The minor releases are guaranteed to have the same MSRV.
Note: If a Rust hotfix is released for the current MSRV, the MSRV will be updated to the specific minor version that includes all applicable hotfixes preceding other policies.
Guidelines for panic vs Result
In general, use panics for bad states that are unreachable, unrecoverable or harmful.
For those caused by invalid user input, however, we prefer to report that invalidity
gracefully as an error result instead of panicking. In general, invalid input should result
in an Error as soon as possible. It is ok for code paths after validation to assume
validation has already occurred and panic if not. See ticket #6737 for more nuances.
Deprecation Guidelines
Minor releases may deprecate, but not remove APIs. Deprecating APIs allows
downstream Rust programs to still compile, but generate compiler warnings. This
gives downstream crates time to migrate prior to API removal.
To deprecate an API:
Mark the API as deprecated using #[deprecated] and specify the exact arrow-rs version in which it was deprecated
Concisely describe the preferred API to help the user transition
The deprecated version is the next version which will be released (please
consult the list above). To mark the API as deprecated, use the
#[deprecated(since = "...", note = "...")] attribute.
In general, deprecated APIs will remain in the codebase for at least two major releases after
they were deprecated (typically between 6 - 9 months later). For example, an API
deprecated in 51.3.0 can be removed in 54.0.0 (or later). Deprecated APIs
may be removed earlier or later than these guidelines at the discretion of the
maintainers.
Related Projects
There are several related crates in different repositories
Collectively, these crates support a wider array of functionality for analytic computations in Rust.
For example, you can write SQL queries or a DataFrame (using the
datafusion crate) to read a parquet file (using the parquet crate),
evaluate it in-memory using Arrow’s columnar format (using the arrow crate),
and send to another process (using the arrow-flight crate).
Generally speaking, the arrow crate offers functionality for using Arrow
arrays, and datafusion offers most operations typically found in SQL,
including joins and window functions.
You can find more details about each crate in their respective READMEs.
Native Rust implementation of Apache Arrow and Apache Parquet
Welcome to the Rust implementation of Apache Arrow, a popular in-memory columnar format and Apache Parquet, a popular columnar file format.
Community
We welcome participation from everyone and encourage you to join us, ask questions, help others, and get involved. All participation in the Apache Arrow project is governed by the Apache Software Foundation’s code of conduct.
We use GitHub issues and pull requests for all technical discussions, reviews, new features, bug fixes and release coordination. This ensures that all communication is public and archived for future reference.
The
dev@arrow.apache.orgmailing list is the communication channel for the overall Apache Arrow community. Instructions for signing up and links to the archives can be found on the Arrow Community page.Some community members also use the Arrow Rust Discord Server and the official ASF Slack server for informal discussions and coordination. This is a great place to meet other contributors and get guidance on where to contribute. However, all technical designs should also be recorded and formalized in GitHub issues, so that they are accessible to everyone. In Slack, find us in the
#arrow-rustchannel and feel free to ask for an invite via Discord, GitHub issues, or other means.There is more information in the contributing guide.
Repository Structure
This repository contains the following crates:
arrowarrow-flightparquetparquet_deriveThe current development version the API documentation can be found here.
Note: previously the
object_storecrate was also part of this repository, but it has been moved to the arrow-rs-object-store repositoryRelease Versioning and Schedule
The Arrow Rust project releases approximately monthly and follows Semantic Versioning.
Due to available maintainer and testing bandwidth,
arrowcrates (arrow,arrow-flight, etc.) are released on the same schedule with the same versions as theparquetandparquet_derivecrates.This crate releases every month. We release new major versions (with potentially breaking API changes) at most once a quarter, and release incremental minor versions in the intervening months. See ticket #5368 for more details.
To keep our maintenance burden down, we do regularly scheduled releases (major and minor) from the
mainbranch. How we handle PRs with breaking API changes is described in the contributing guide.Planned Release Schedule
58.1.058.2.059.0.0Rust Version Compatibility Policy
arrow-rs and parquet are built and tested with stable Rust, and will keep a rolling MSRV (minimum supported Rust version) that can only be updated in major releases on an as needed basis (e.g. project dependencies bump their MSRV or a particular Rust feature is useful for us etc.). The new MSRV if selected will be at least 6 months old. The minor releases are guaranteed to have the same MSRV.
Note: If a Rust hotfix is released for the current MSRV, the MSRV will be updated to the specific minor version that includes all applicable hotfixes preceding other policies.
Guidelines for
panicvsResultIn general, use panics for bad states that are unreachable, unrecoverable or harmful. For those caused by invalid user input, however, we prefer to report that invalidity gracefully as an error result instead of panicking. In general, invalid input should result in an
Erroras soon as possible. It is ok for code paths after validation to assume validation has already occurred and panic if not. See ticket #6737 for more nuances.Deprecation Guidelines
Minor releases may deprecate, but not remove APIs. Deprecating APIs allows downstream Rust programs to still compile, but generate compiler warnings. This gives downstream crates time to migrate prior to API removal.
To deprecate an API:
#[deprecated]and specify the exact arrow-rs version in which it was deprecatedThe deprecated version is the next version which will be released (please consult the list above). To mark the API as deprecated, use the
#[deprecated(since = "...", note = "...")]attribute.For example
In general, deprecated APIs will remain in the codebase for at least two major releases after they were deprecated (typically between 6 - 9 months later). For example, an API deprecated in
51.3.0can be removed in54.0.0(or later). Deprecated APIs may be removed earlier or later than these guidelines at the discretion of the maintainers.Related Projects
There are several related crates in different repositories
object_storedatafusionballistaparquet_opendalopendal] forparquetArrow IOCollectively, these crates support a wider array of functionality for analytic computations in Rust.
For example, you can write SQL queries or a
DataFrame(using thedatafusioncrate) to read a parquet file (using theparquetcrate), evaluate it in-memory using Arrow’s columnar format (using thearrowcrate), and send to another process (using thearrow-flightcrate).Generally speaking, the
arrowcrate offers functionality for using Arrow arrays, anddatafusionoffers most operations typically found in SQL, includingjoins and window functions.You can find more details about each crate in their respective READMEs.