THIS PODLING HAS BEEN RETIRED. PLEASE SEE RETIRED.md FOR DETAILS.
What is Quickstep?
Apache Quickstep is high-performance database engine designed to exploit the full potential of hardware that is packed in modern computing boxes (servers and laptops). The initial version (available now!) targets single-node in-memory environments. If your data spills overs the memory limit Quickstep will still work, so you don’t have to obsessively worry about the in-memory part. Also, if your working set fits in memory then Quickstep will transparently and automatically figure that out, and cache that hot set to deliver in-memory performance.
Distributed execution is the next big feature for Quickstep.
Did you know that the hardware that you have in your laptop was spread across a small cluster just a decade ago? (PS: Hopefully you are not using a very old laptop!) If you look at a high-end server box, then that packs compute and storage power that was a full rack about 5 years ago! And, the way hardware technology is going, that box is going to become even more powerful in the future. In fact, it is likely that the computing power in each box is going to grow faster than other hardware components (e.g. networking) in data centers. So, if you care about performance and/or total operating costs, paying attention to single box performance is likely to be super important in the long run.
In other words there is a small data center in an individual compute boxes today! Quickstep aims to allow you to fully exploit the potential of that data center that is hidden in each individual box today. We call this the scaling-in approach, and it complements a scaling-out approach. But without scaling-in, you are overpaying (by a lot!) when you run your data service.
What are the key ingredients?
Modern computing boxes contain a large number of computing cores and large main memory configuration. Quickstep allows you to fully exploit these hardware resources using novel data processing, data storage, and query processing methods that include:
A unique decoupling of data-flow from control-flow for query execution that allows for unlimited intra and inter-query parallelism. Thus, using all the processing core effectively.
A template meta-programming framework that provides fast vectorized query execution. Thus, using each processor cycle very efficiently.
A hybrid data storage architecture that includes columnar and row-store. Yes, this may surprise some of you, but sometimes a row-store beats a column-store!
And, it is open source!
Giving it a spin
Checkout the code: git clone https://git-wip-us.apache.org/repos/asf/incubator-quickstep.git quickstep
Then, go to the code directory: cd quickstep
Initialize the dependencies: git submodule init
Checkout the dependencies: git submodule update
Download additional third-party dependencies and apply patches: cd third_party && ./download_and_patch_prerequisites.sh && cd ../
Go into the build directory: cd build
Create the Makefile: cmake -D CMAKE_BUILD_TYPE=Release ..
Build: make -j4. Note you may replace the 4 with the number of cores
on your machine.
Start quickstep: ./quickstep_cli_shell --initialize_db=true. You can
now fire SQL queries. To quit, you can type in quit; Your data is
stored in the directory qsstor. Note the next time you start Quickstep,
you can omit the --initialize_db flag (as the database has already
been initialized), and simply start Quickstep as: ./quickstep_cli_shell.
There are also a number of optional flags that you can specify, and to see
the full list, you can type in: ./quickstep_cli_shell --help
Next let us load some data and fire some queries. A few points to note:
The SQL surface of Quickstep is small (it will grow over time). The
traditional SQL CREATE TABLE and SELECT statements work. The data types
that are supported include INTEGER, FLOAT, DOUBLE, VARCHAR, CHAR, DATE,
and DATETIME. Quickstep also does not have support for NULLS or keys (yet).
Let us create two tables by typing into the Quickstep shell (which you opened
in the step above), the following SQL command:
Quickstep is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Disclaimer
Apache Quickstep is an effort undergoing incubation at the Apache Software
Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other
successful ASF projects.
While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the
project has yet to be fully endorsed by the ASF.
Apache Quickstep (Incubating)
THIS PODLING HAS BEEN RETIRED. PLEASE SEE RETIRED.md FOR DETAILS.
What is Quickstep?
Apache Quickstep is high-performance database engine designed to exploit the full potential of hardware that is packed in modern computing boxes (servers and laptops). The initial version (available now!) targets single-node in-memory environments. If your data spills overs the memory limit Quickstep will still work, so you don’t have to obsessively worry about the in-memory part. Also, if your working set fits in memory then Quickstep will transparently and automatically figure that out, and cache that hot set to deliver in-memory performance.
Distributed execution is the next big feature for Quickstep.
Quickstep began life in 2011 as a research project at the University of Wisconsin and entered incubation at the Apache Software Foundation in April, 2016.
Why Quickstep?
Did you know that the hardware that you have in your laptop was spread across a small cluster just a decade ago? (PS: Hopefully you are not using a very old laptop!) If you look at a high-end server box, then that packs compute and storage power that was a full rack about 5 years ago! And, the way hardware technology is going, that box is going to become even more powerful in the future. In fact, it is likely that the computing power in each box is going to grow faster than other hardware components (e.g. networking) in data centers. So, if you care about performance and/or total operating costs, paying attention to single box performance is likely to be super important in the long run.
In other words there is a small data center in an individual compute boxes today! Quickstep aims to allow you to fully exploit the potential of that data center that is hidden in each individual box today. We call this the scaling-in approach, and it complements a scaling-out approach. But without scaling-in, you are overpaying (by a lot!) when you run your data service.
What are the key ingredients?
Modern computing boxes contain a large number of computing cores and large main memory configuration. Quickstep allows you to fully exploit these hardware resources using novel data processing, data storage, and query processing methods that include:
A unique decoupling of data-flow from control-flow for query execution that allows for unlimited intra and inter-query parallelism. Thus, using all the processing core effectively.
A template meta-programming framework that provides fast vectorized query execution. Thus, using each processor cycle very efficiently.
A hybrid data storage architecture that includes columnar and row-store. Yes, this may surprise some of you, but sometimes a row-store beats a column-store!
And, it is open source!
Giving it a spin
git clone https://git-wip-us.apache.org/repos/asf/incubator-quickstep.git quickstepcd quickstepgit submodule initgit submodule updatecd third_party && ./download_and_patch_prerequisites.sh && cd ../cd buildcmake -D CMAKE_BUILD_TYPE=Release ..make -j4. Note you may replace the 4 with the number of cores on your machine../quickstep_cli_shell --initialize_db=true. You can now fire SQL queries. To quit, you can type inquit;Your data is stored in the directoryqsstor. Note the next time you start Quickstep, you can omit the--initialize_dbflag (as the database has already been initialized), and simply start Quickstep as:./quickstep_cli_shell. There are also a number of optional flags that you can specify, and to see the full list, you can type in:./quickstep_cli_shell --helpand then,
Next, let us insert some tuples in these two tables.
We can now issue SQL queries such as: a. Find all weather records for California:
b. Find the min and max temperature for each city, printing the
cid:c. Find the min and max temperature for each city using a nested query, and printing thie city name:
Quickstep also supports a COPY TABLE command. If you want to try that, then from a separate shell file type in the following:
Then, load this new data by typing the following SQL in the Quickstep shell:
Now, you have loaded three more tuples into the Weather table, and you can fire the SQL queries above again against this modified database.
Remember, to quit Quickstep, you can type in
quit;into the Quickstep shell.Additional pointers
Licensing
Quickstep is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.
Disclaimer
Apache Quickstep is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.