Apache Drill 1.8.0 Release Notes

Release date: August 30, 2016

Today, we’re happy to announce the availability of Drill 1.8.0. You can download it here.

New Features

This release of Drill provides the following new features:

  • Metadata cache pruning
  • IF EXISTS parameter with the DROP TABLE and DROP VIEW commands
  • DESCRIBE SCHEMA command
  • Multi-byte delimiter support
  • New parameters for filter selectivity estimates

Configuration and Launch Script Changes

This release of Drill also includes the following changes to the configuration and launch scripts:

  • The $DRILL_HOME/conf/drill-env.sh file has been simplified. Default Drill settings have moved out of this file and now reside in $DRILL_HOME/bin/drill-config.sh. The drill-env.sh file now ships with descriptions of the options that you can set. You can override many settings by creating an entry in $DRILL_HOME/conf/drill-env.sh. (DRILL-4581)
  • Due to issues at high concurrency, the native Linux epoll transport is now disabled by default. (DRILL-4623)

Changes to the Drill launch scripts provide a new option that simplifies the Drill upgrade for Drill 1.8 and later. The new scripts support a “site directory” that holds site-specific files separate from the Drill product directory. The site directory is a simple extension of the config directory in previous Drill releases. You can add a “jars” subdirectory to the config directory for your custom jars instead of storing them in $DRILL_HOME. You can even add native libraries in the new “lib” directory and Drill automatically loads them.

Example:

   /my/conf/dir
   |- drill-override.conf
   |- drill-env.sh
   |- jars
      |- myudf.jar
   |- lib
      |- mysecurity.so

To use the site directory:

   drillbit.sh —site /my/conf/dir start

Note that --config still works as well.

You can set an environment variable for the directory:

   export DRILL_CONF_DIR=/my/conf/dir
   drillbit.sh start

The site directory works with drillbit.sh and the various Drill client scripts.

To upgrade Drill using the new site directory, just delete the old Drill product directory and expand the Drill archive to create a new one. There is no need to back up and merge files each time you upgrade because the site files are not affected by an upgrade. Using the site directory, you can have different site directories for different Drill sessions: one for development, another for test, and so on. You can use the site directory to run multiple Drill clusters from as single Drill installation by creating a site directory for each Drill cluster and then configuring Drill.

The following sections list additional bug fixes and improvements:

Sub-task

  • [DRILL-4581] - Various problems in the Drill startup scripts
  • [DRILL-4728] - Add support for new metadata fetch APIs
  • [DRILL-4729] - Add support for prepared statement implementation on server side
  • [DRILL-4732] - Update JDBC driver to use the new prepared statement APIs on DrillClient

Bug

  • [DRILL-3726] - Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
  • [DRILL-4147] - Union All operator runs in a single fragment
  • [DRILL-4175] - IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently
  • [DRILL-4341] - Fails to parse string literals containing escaped quotes
  • [DRILL-4574] - Avro Plugin: Flatten does not work correctly on record items
  • [DRILL-4623] - Disable Epoll by Default
  • [DRILL-4658] - cannot specify tab as a fieldDelimiter in table function
  • [DRILL-4664] - ScanBatch.isNewSchema() returns wrong result for map datatype
  • [DRILL-4665] - Partition pruning not working for hive partitioned table with 'LIKE' and '=' filter
  • [DRILL-4704] - select statement behavior is inconsistent for decimal values in parquet
  • [DRILL-4707] - Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
  • [DRILL-4715] - Java compilation error for a query with large number of expressions
  • [DRILL-4743] - HashJoin's not fully parallelized in query plan
  • [DRILL-4746] - Verification Failures (Decimal values) in drill's regression tests
  • [DRILL-4759] - Drill throwing array index out of bound exception when reading a parquet file written by map reduce program.
  • [DRILL-4767] - Parquet reader throw IllegalArgumentException for int32 type with GZIP compression
  • [DRILL-4768] - Drill may leak hive meta store connection if hive meta store client call hits error
  • [DRILL-4769] - forman spins query int32 data with snappy compression
  • [DRILL-4770] - ParquetRecordReader throws NPE querying a single int64 column file
  • [DRILL-4783] - Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
  • [DRILL-4785] - Limit 0 queries regressed in Drill 1.7.0
  • [DRILL-4794] - Regression: Wrong result for query with disjunctive partition filters
  • [DRILL-4801] - Setting extractHeader attribute for CSV format does not propagate to all drillbits
  • [DRILL-4816] - sqlline -f failed to read the query file
  • [DRILL-4825] - Wrong data with UNION ALL when querying different sub-directories under the same table
  • [DRILL-4833] - Union-All with a small cardinality input on one side does not get parallelized
  • [DRILL-4836] - ZK Issue during Drillbit startup, possibly due to race condition
  • [DRILL-4846] - Eliminate extra operations during metadata cache pruning
  • [DRILL-4852] - COUNT(*) query against a large JSON table slower by 2x
  • [DRILL-4854] - Incorrect logic in log directory checks in drill-config.sh
  • [DRILL-4857] - When no partition pruning occurs with metadata caching there's a performance regression

Improvement

  • [DRILL-2330] - Add support for nested aggregate expressions for window aggregates
  • [DRILL-3149] - TextReader should support multibyte line delimiters
  • [DRILL-3710] - Make the 20 in-list optimization configurable
  • [DRILL-4530] - Improve metadata cache performance for queries with single partition
  • [DRILL-4751] - Remove dumpcat script from Drill distribution
  • [DRILL-4766] - FragmentExecutor should use EventProcessor and avoid blocking rpc threads
  • [DRILL-4786] - Improve metadata cache performance for queries with multiple partitions
  • [DRILL-4822] - Extend distrib-env.sh search to consider site directory

New Feature

  • [DRILL-4514] - Add describe schema <schema_name> command
  • [DRILL-4673] - Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return
  • [DRILL-4819] - Update MapR version to 5.2.0

Task