Today I’m happy to announce the availability of the Drill 1.2 release. This release addresses 217 JIRAs on top of the 1.1 release. Highlights include:
Relational Database Support
Drill now includes a JDBC storage plugin for querying relational databases (RDBMSs). Users can run SQL queries that join data between non-relational datastores (eg, MongoDB, HBase, HDFS, S3) and relational databases (eg, MySQL, Oracle). For example, a single query can join log files in HDFS with a users table in MySQL. Drill automatically pushes execution (projections, filters, partial joins, etc.) down into the RDBMS whenever possible.
New Window Functions
The 1.2 release adds additional window functions: NTILE
, FIRST_VALUE
, LAST_VALUE
, LEAD
and LAG
. Drill now supports 15 different window functions:
- Value Functions:
FIRST_VALUE
,LAST_VALUE
,LEAD
,LAG
- Aggregate Functions:
AVG
,COUNT
,MAX
,MIN
,SUM
- Ranking Functions:
CUME_DIST
,DENSE_RANK
,NTILE
,PERCENT_RANK
,RANK
,ROW_NUMBER
In addition to supporting new window functions, Drill 1.2 adds support for multiple window functions in a single query. A query can contain multiple window functions that slice up the data in different ways by means of different OVER
clauses, but they all act on the same collection of rows.
Parquet Metadata Caching
When running a query against a directory tree with Parquet files, Drill scans the directory and reads the footers of the files during the planning phase. This allows Drill to prune partitions and optimize query execution for data locality. However, this process can be time consuming for directory trees with thousands of files. Drill 1.2 includes a new feature that caches the metadata information so that subsequent queries don’t need to scan all the files. The cache is automatically maintained based on the directory timestamps.
Performance Improvements on HBase and Hive Tables
Drill 1.2 introduces a faster read path for HBase and Hive tables. When querying Hive tables backed by Parquet files, Drill now uses a high-performance Parquet reader rather than the Hive SerDe.
DROP TABLE
for Files and Directories
Drill 1.2 allows users to drop file- and directory-based tables with a SQL command (DROP TABLE
).
Enhanced MongoDB Integration
Drill 1.2 supports extended JSON types, addressing previous issues with queries on MongoDB collections.
Many More Fixes
Drill 1.2 includes hundreds of other fixes and enhancements.
Download the Drill 1.2 release now and let us know your thoughts.
Drill On! Jacques Nadeau