Use shredded vectors for casts from primitive <> variant (#21182)
https://github.com/duckdb/duckdb/pull/20912 introduced
VectorType::SHREDDED, which allows variant vectors to maintain their shredded-ness during execution. This PR extends support for this by adding support for shredded vectors when casting to and from primitive types.
- When casting from primitives to variant, we can always emit a shredded vector, as we know the result has a specific schema (e.g.
BIGINT). This allows us to skip any conversion code and instead directly reference the input vector.- When casting from variant to primitive, we need to check if (1) the vector is of type
VectorType::SHREDDED, and (2) the vector is shredded according to the primitive type. If that is the case, we can directly reference the shredded component of the variant vector.The following query showcases the potential performance gains:
select sum(l_orderkey::VARIANT::BIGINT) FROM lineitem;
Version Time (s) v1.5.0 1.2s New 0.08s Running without the casts takes
0.04s, so there’s still some room for improvement there. The main performance issue seems to be related to the vector caches that are relatively complex for variants.
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
DuckDB
DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and several extensions designed to make SQL easier to use.
DuckDB is available as a standalone CLI application and has clients for Python, R, Java, Wasm, etc., with deep integrations with packages such as pandas and dplyr.
For more information on using DuckDB, please refer to the DuckDB documentation.
Installation
If you want to install DuckDB, please see our installation page for instructions.
Data Import
For CSV files and Parquet files, data import is as simple as referencing the file in the FROM clause:
Refer to our Data Import section for more information.
SQL Reference
The documentation contains a SQL introduction and reference.
Development
For development, DuckDB requires CMake, Python 3 and a
C++11compliant compiler. In the root directory, runmaketo compile the sources. For development, usemake debugto build a non-optimized debug version. You should runmake unitandmake allunitto verify that your version works properly after making changes. To test performance, you can runBUILD_BENCHMARK=1 BUILD_TPCH=1 makeand then perform several standard benchmarks from the root directory by executing./build/release/benchmark/benchmark_runner. The details of benchmarks are in our Benchmark Guide.Please also refer to our Build Guide and Contribution Guide.
Support
See the Support Options page and the dedicated
endoflife.datepage.