[Hotfix] Force ReachableFileCleanup when snapshots exist outside main ancestry (#4231)
[optimizer] Force reachable cleanup when snapshots exist outside main ancestry
Iceberg’s auto-selected IncrementalFileCleanup can silently truncate its ancestor walk when a parent snapshot is missing, deleting data files the current snapshot still references. Force the safe ReachableFileCleanup only when snapshots exist outside the main ancestry; healthy tables are unchanged.
Signed-off-by: Jiwon Park jpark92@outlook.kr
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802047560号
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats. Working with compute engines including Flink, Spark, and Trino, Amoro brings pluggable and self-managed features for Lakehouse to provide out-of-the-box data warehouse experience, and helps data platforms or products easily build infra-decoupled, stream-and-batch-fused and lake-native architecture.
Learn more about Amoro at https://amoro.apache.org/, contact the developers and community on the mailing list if you need any help.
Architecture
Here is the architecture diagram of Amoro:
Supported table formats
Amoro can manage tables of different table formats, similar to how MySQL/ClickHouse can choose different storage engines. Amoro meets diverse user needs by using different table formats. Currently, Amoro supports four table formats:
Supported engines
Iceberg format
Iceberg format tables use the engine integration method provided by the Iceberg community. For details, please refer to: Iceberg Docs.
Mixed format
Amoro support multiple processing engines for Mixed format as below:
Features
Modules
Amoro contains modules as below:
amoro-commoncontains core abstractions and common implementation for other modulesamoro-amsis amoro management service moduleamoro-webis the dashboard frontend for amsamoro-optimizerprovides default optimizer implementationamoro-format-icebergcontains integration of Apache Iceberg formatamoro-format-hudicontains integration of Apache Hudi formatamoro-format-paimoncontains integration of Apache Paimon formatamoro-format-mixedprovides Mixed format implementationamoro-mixed-hiveintegrates with Apache Hive and implements Mixed Hive formatamoro-mixed-flinkprovides Flink connectors for Mixed format tables (use amoro-flink-runtime for a shaded version)amoro-mixed-sparkprovides Spark connectors for Mixed format tables (use amoro-spark-runtime for a shaded version)amoro-mixed-trinoprovides Trino connectors for Mixed format tablesBuilding
Amoro is built using Maven with JDK 8, 11 and 17(required for
amoro-format-mixed/amoro-mixed-trinomodule).amoro-mixed-trino:./mvnw clean package./mvnw clean package -DskipTests./mvnw clean package -Pskip-dashboard-build./mvnw clean package -DskipTests -Pno-extented-disk-storage./mvnw clean package -DskipTests -Paliyun-oss-sdk./mvnw clean package -DskipTests -Phadoop2./mvnw clean package -DskipTests -Dflink-optimizer.flink-version=1.20.0-Pflink-optimizer-pre-1.15parameter:./mvnw clean package -DskipTests -Pflink-optimizer-pre-1.15 -Dflink-optimizer.flink-version=1.14.6./mvnw clean package -DskipTests -Dspark.version=3.5.7amoro-mixed-trinomodule under JDK 17:./mvnw clean package -DskipTests -Pformat-mixed-format-trino,build-mixed-format-trino -pl 'amoro-format-mixed/amoro-mixed-trino' -am../mvnw clean package -DskipTests -Ptoolchain,build-mixed-format-trino, besides you need configtoolchains.xmlin${user.home}/.m2/dir with content below../mvnw clean package -Psupport-all-formats./mvnw clean package -Psupport-paimon-format./mvnw clean package -Psupport-hudi-formatQuickstart
Visit https://amoro.apache.org/quick-start/ to quickly explore what amoro can do.
Join Community
If you are interested in Lakehouse, Data Lake Format, welcome to join our community, we welcome any organizations, teams and individuals to grow together, and sincerely hope to help users better use Data Lake Format through open source.
Slack
You can join the Amoro community on Slack. Amoro channel is in ASF Slack workspace.
dev@amoro.apache.orgto apply for an ASF Slack invitation. Then join Amoro channel.Wechat
Join the Amoro WeChat Group: Add “
kllnn999“ as a friend and request to join the group.Contributors
This project exists thanks to all the people who contribute.
Made with contrib.rocks.
Star History