Drill 1.2 Released - Apache Drill

Today I’m happy to announce the availability of the Drill 1.2 release. This release addresses 217 JIRAs on top of the 1.1 release. Highlights include:

Relational Database Support

Drill now includes a JDBC storage plugin for querying relational databases (RDBMSs). Users can run SQL queries that join data between non-relational datastores (eg, MongoDB, HBase, HDFS, S3) and relational databases (eg, MySQL, Oracle). For example, a single query can join log files in HDFS with a users table in MySQL. Drill automatically pushes execution (projections, filters, partial joins, etc.) down into the RDBMS whenever possible.

New Window Functions

The 1.2 release adds additional window functions: NTILE, FIRST_VALUE, LAST_VALUE, LEAD and LAG. Drill now supports 15 different window functions:

Value Functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG
Aggregate Functions: AVG, COUNT, MAX, MIN, SUM
Ranking Functions: CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, RANK, ROW_NUMBER

In addition to supporting new window functions, Drill 1.2 adds support for multiple window functions in a single query. A query can contain multiple window functions that slice up the data in different ways by means of different OVER clauses, but they all act on the same collection of rows.

Parquet Metadata Caching

When running a query against a directory tree with Parquet files, Drill scans the directory and reads the footers of the files during the planning phase. This allows Drill to prune partitions and optimize query execution for data locality. However, this process can be time consuming for directory trees with thousands of files. Drill 1.2 includes a new feature that caches the metadata information so that subsequent queries don’t need to scan all the files. The cache is automatically maintained based on the directory timestamps.

Performance Improvements on HBase and Hive Tables

Drill 1.2 introduces a faster read path for HBase and Hive tables. When querying Hive tables backed by Parquet files, Drill now uses a high-performance Parquet reader rather than the Hive SerDe.

`DROP TABLE` for Files and Directories

Drill 1.2 allows users to drop file- and directory-based tables with a SQL command (DROP TABLE).

Enhanced MongoDB Integration

Drill 1.2 supports extended JSON types, addressing previous issues with queries on MongoDB collections.

Many More Fixes

Drill 1.2 includes hundreds of other fixes and enhancements.

Download the Drill 1.2 release now and let us know your thoughts.

Drill On! Jacques Nadeau