Today I’m happy to announce the availability of the Drill 1.3 release. This release addresses 58 JIRAs on top of the 1.2 release. Highlights include:

Enhanced Amazon S3 Support

Drill 1.3 utilizes a new library, called s3a, for reading data from S3. The s3a library includes improvements over the previous s3n library, such as higher performance and the ability to read large files (over 5GB).

In addition to the new s3a library, Drill 1.3 makes it easier to set up your AWS credentials. Simply edit the file conf/core-site.xml in the Drill install directory. For more information, check out the step-by-step instructions in the documentation.

Heterogeneous Types

Drill 1.3 includes support for mixed-type columns, often found in systems like MongoDB and file formats like JSON. For example, Drill can now columns that evolve from one data type to another over time.

Drill 1.3 provides a collection of functions that enable you to test the data type of a value. For example, if you have a column that has both lists (arrays) and numbers, you can use the following query to extract the first element from the array values:

SELECT 1 + CASE WHEN is_list(a) THEN a[0] ELSE a END FROM table;

Text File Headers

Drill is now able to parse the header row in a text file (CSV, TSV, etc.). Prior to Drill 1.3, data had to be accessed through the columns array:

SELECT columns[0], columns[1] FROM dfs.`/path/to/users.csv`

With Drill 1.3, you can use the actual column names in the CSV file:

SELECT name, address FROM dfs.`/path/to/users.csv`

Enabling header parsing is as simple as setting the extractHeader parameter in the storage plugin configuration for the desired file extensions. For more information, check out the documentation.

Sequence Files

Drill now supports sequence files, a format commonly used in the Hadoop ecosystem. A sequence file contains a series of keys and values, and querying it with Drill is as easy as querying any other self-describing format:

SELECT *
FROM dfs.tmp.`simple.seq`
LIMIT 1;
+--------------+---------------+
|  binary_key  | binary_value  |
+--------------+---------------+
| [B@70828f46  | [B@b8c765f    |
+--------------+---------------+

Drill’s CONVERT_FROM function makes it easy to decode the binary values:

SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8')
FROM dfs.tmp.`simple.seq`
LIMIT 1
;
+-----------+-------------+
|  EXPR$0   |   EXPR$1    |
+-----------+-------------+
| key0      |   value0    |
+-----------+-------------+

Many More Fixes

Drill 1.3 includes many other improvements, including enhancements related to querying Hive tables, MongoDB collections and Avro files. Check out the complete list of fixes and enhancements for more information.

Download the Drill 1.3 release now and let us know your thoughts.

Drill On! Jacques Nadeau