Apache Drill 1.3.0 Release Notes

Release date: November 22, 2015

Today, we’re happy to announce the availability of Drill 1.3.0, providing bug fixes and enhancements.

Enhancements and Bug Fixes

Sub-task

  • [DRILL-1721] - Configure fmpp-maven-plugin for incremental build
  • [DRILL-3313] - Eliminate redundant #load methods and unit-test loading & exporting of vectors

Bug

  • [DRILL-1752] - Drill cluster returns error when querying Mongo shards on an unsharded collection
  • [DRILL-2161] - Flatten on a list within a list on a large data set results in an IOB Exception
  • [DRILL-2583] - Querying a non-existent table from hbase should throw a proper error message
  • [DRILL-2626] - org.apache.drill.common.StackTrace seems to have duplicate code; should we re-use Throwable's code?
  • [DRILL-2967] - Incompatible types error reported in a "not in" query with compatible data types
  • [DRILL-3336] - to_date(to_timestamp) with group-by in hbase/maprdb table fails with "java.lang.UnsupportedOperationException"
  • [DRILL-3428] - Errors during text filereading should provide the file name in the error messge
  • [DRILL-3429] - DrillAvgVarianceConvertlet may produce wrong results while rewriting stddev, variance
  • [DRILL-3485] - Doc. site JDBC page(s) should at least point to JDBC Javadoc in source
  • [DRILL-3486] - Doc. site JDBC page(s) should link to JDBC driver Javadoc doc. once it's available
  • [DRILL-3505] - MongoDB _id is returned as null when t.*, t._id is used in the projection
  • [DRILL-3538] - We do not prune partitions when we count over partitioning key and filter over partitioning key
  • [DRILL-3578] - UnsupportedOperationException: Unable to get value vector class for minor type [FIXEDBINARY] and mode [OPTIONAL]
  • [DRILL-3634] - Hive Scan : Add fileCount (no of files scanned) or no of partitions scanned to the text plan
  • [DRILL-3770] - Query with window function having just ORDER BY clause runs out of memory on large datasets
  • [DRILL-3871] - Off by one error while reading binary fields with one terminal null in parquet
  • [DRILL-3921] - Hive LIMIT 1 queries take too long
  • [DRILL-3937] - We are not pruning when we have a metadata cache and auto partitioned data in some cases
  • [DRILL-3941] - Add timing instrumentation around Partition Pruning
  • [DRILL-3943] - CannotPlanException caused by ExpressionReductionRule
  • [DRILL-3947] - IndexOutOfBoundsException for pruning on date column (at large scale)
  • [DRILL-3952] - Improve Window Functions performance when not all batches are required to process the current batch
  • [DRILL-3956] - TEXT MySQL type unsupported
  • [DRILL-3975] - Partition Planning rule causes query failure due to IndexOutOfBoundsException on HDFS
  • [DRILL-3980] - Build failure in -Pmapr profile (due to DRILL-3749)
  • [DRILL-3992] - Unable to query Oracle DB using JDBC Storage Plug-In
  • [DRILL-3994] - Build Fails on Windows after DRILL-3742
  • [DRILL-4000] - In all non-root fragments, Drill recreates storage plugin instances for every minor fragment
  • [DRILL-4025] - Reduce getFileStatus() invocation for Parquet by 1
  • [DRILL-4028] - Merge Drill parquet modifications back into the mainline project
  • [DRILL-4040] - Build failure on master
  • [DRILL-4042] - Unable to run sqlline in embedded mode on Windows
  • [DRILL-4046] - Performance regression in some tpch queries with 1.3rc0 build
  • [DRILL-4056] - Avro deserialization corrupts data
  • [DRILL-4065] - ImpersonationUtil always creates new UserGroupInformation (thus new FileSystem objects), causing excessive number of threads
  • [DRILL-4070] - Files written with versions of Drill before v1.3 record metadata that is indistinguishable from bad metadata from other Parquet creators
  • [DRILL-4071] - Partition pruning fails when a Coalesce() function appears with partition filter
  • [DRILL-4080] - doc file deleted from gh-pages appears when obsolete url is used
  • [DRILL-4085] - Disable RPC Offload until concurrency bugs are tracked down

Improvement

  • [DRILL-1065] - Provide a reset command to reset an option to its default value
  • [DRILL-2726] - Display Drill version in sys.version
  • [DRILL-3242] - Enhance RPC layer to offload all request work onto a separate thread.
  • [DRILL-3340] - Add named metrics and named operators in OperatorProfile
  • [DRILL-3742] - Improve classpath scanning to reduce the time it takes
  • [DRILL-3793] - Rewrite MergeJoinBatch using record batch iterator.
  • [DRILL-3810] - Filesystem plugin's support for file format's schema
  • [DRILL-3911] - Upgrade Hadoop from 2.4.1 to latest stable
  • [DRILL-3912] - Common subexpression elimination in code generation
  • [DRILL-3914] - Support geospatial queries
  • [DRILL-4031] - JDBC Plugin Queries fail if columns return JDBC OTHER type
  • [DRILL-4103] - Add additional metadata to Parquet files generated by Drill

New Feature

  • [DRILL-951] - CSV header row should be parsed
  • [DRILL-3749] - Upgrade Hadoop dependency to latest version (2.7.1)
  • [DRILL-3802] - Throw unsupported error for ROLLUP/GROUPING
  • [DRILL-3963] - Read raw key value bytes from sequence files

Test