ClickHouse Hadoop

Integrate ClickHouse natively with Hive, currently only writing is supported. Connecting Hadoop’s massive data storage and deep processing power with the high performance of ClickHouse.

Build the Project

mvn package -Phadoop26 -DskipTests

Run the test cases

It is required that a clickhouse-server is running in the localhost to correctly run the test cases.

Usage

Create ClickHouse table

CREATE TABLE hive_test
(
    c1 String,
    c2 Float64,
    c3 String
)
ENGINE = MergeTree()
PARTITION BY c3
ORDER BY c1

Create Hive External Table

Before starting the hive cli, set the environment variable HIVE_AUX_JARS_PATH

export HIVE_AUX_JARS_PATH=<path-to-your-project>/target/clickhouse-hadoop-<version>.jar

Then start the hive-cli and create Hive external table:

CREATE EXTERNAL TABLE default.ck_test(
   c1 string,
   c2 double,
   c3 string
)
STORED BY 'data.bytedance.net.ck.hive.ClickHouseStorageHandler'
TBLPROPERTIES('clickhouse.conn.urls'='jdbc:clickhouse://<host-1>:<port1>,jdbc:clickhouse://<host2>:<port2>',
'clickhouse.table.name'='hive_test');

Data Ingestion

In hive-cli

INSERT INTO default.ck_test
select  c1, c2, c3 FROM default.source_table where part='part_val'