Apache Drill 0.7.0 Release Notes

Apache Drill 0.7.0, the third beta release for Drill, is designed to help enthusiasts start working and experimenting with Drill. It also continues the Drill monthly release cycle as we drive towards general availability.

This release is available as binary and source tarballs that are compiled against Apache Hadoop. Drill has been tested against MapR, Cloudera, and Hortonworks Hadoop distributions. There are associated build profiles and JIRAs that can help you run Drill against your preferred distribution

Apache Drill 0.7.0 Key Features

  • No more dependency on UDP/Multicast - Making it possible for Drill to work well in the following scenarios:

    • UDP multicast not enabled (as in EC2)

    • Cluster spans multiple subnets

    • Cluster has multihome configuration

  • New functions to natively work with nested data - KVGen and Flatten

  • Support for Hive 0.13 (Hive 0.12 with Drill is not supported any more)

  • Improved performance when querying Hive tables and File system through partition pruning

  • Improved performance for HBase with LIKE operator pushdown

  • Improved memory management

  • Drill web UI monitoring and query profile improvements

  • Ability to parse files without explicit extensions using default storage format specification

  • Fixes for dealing with complex/nested data objects in Parquet/JSON

  • Fast schema return - Improved experience working with BI/query tools by returning metadata quickly

  • Several hang related fixes

  • Parquet writer fixes for handling large datasets

  • Stability improvements in ODBC and JDBC drivers

Apache Drill 0.7.0 Key Notes and Limitations

  • The current release supports in-memory and beyond-memory execution. However, you must disable memory-intensive hash aggregate and hash join operations to leverage this functionality.
  • While the Drill execution engine supports dynamic schema changes during the course of a query, some operators have yet to implement support for this behavior, such as Sort. Other operations, such as streaming aggregate, may have partial support that leads to unexpected results.