Apache Drill 1.4 (available here) includes bug fixes and enhancements from 32 JIRAs.
Here’s a list of highlights from this newest version of Drill:
Select With Options
Queries that change storage plugin configuration options can now be written. For instance, to query the file CO.dat
, the following can be used:
SELECT * FROM TABLE(dfs.`/path/to/CO.dat`(type => 'text'));
If a version of CO.dat
with a header is available, the first entries of the file can be parsed as column names by
passing an extractHeader => true
argument. We can also use a pipe symbol, ‘|’, as the delimiter by passing
fieldDelimiter
:
SELECT * FROM TABLE(dfs.`/path/to/CO.dat`(type => 'text', fieldDelimiter => '|', extractHeader => true));
Additionally, lineDelimiter can be used to indicate a deliminter for new lines, such as the double pipe, ‘ |
’, symbol in this example: |
SELECT * FROM TABLE(dfs.`/path/to/CO.dat`(type => 'text', lineDelimiter => '||', fieldDelimiter => '|'));
Improved Behavior For CSV Header Parsing
When header parsing is enabled, queries to CSV files no longer raise an exception if the indicated column does not
exist. Instead, Drill now returns null
values for that column.
JSON Formatting
For more compact results, Drill’s default behavior of pretty-printing JSON can now be changed by setting the variable
store.json.writer.uglify
to true
. As in:
ALTER SESSION SET store.json.writer.uglify = true;
Better Logging
SQL query text is now logged to the drillbit.log
file.
Other Improvements
This version also features schema change compatible sorting, better Apache Hive support, and more efficient caching for Parquet file metadata.