Apache Drill 1.10.0 Release Notes

Release date: March 15, 2017

Today, we’re happy to announce the availability of Drill 1.10.0. You can download it here.

New Features and Improvements

This release of Drill provides the following new features and improvements:

  • Support for the CREATE TEMPORARY TABLE AS (CTTAS) command.
  • A JDBC connection option that improves fault tolerance when connecting directly to a Drill node from a client.
  • The Web UI displays the Drill version and additional query profile statistics.
  • Drill implicitly interprets the INT96 timestamp data type in Parquet files.
  • Support for Kerberos authentication between the client and drillbit.

The following sections list additional bug fixes and improvements:

Sub-task

  • [DRILL-4272] - When sort runs out of memory and query fails, resources are seemingly not freed
  • [DRILL-4301] - OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
  • [DRILL-4730] - Update JDBC DatabaseMetaData implementation to use new Metadata APIs
  • [DRILL-5008] - Refactor, document and simplify ExternalSortBatch
  • [DRILL-5011] - External Sort Batch memory use depends on record width
  • [DRILL-5014] - ExternalSortBatch cache size, spill count differs from config setting
  • [DRILL-5017] - Config param drill.exec.sort.external.batch.size is not used
  • [DRILL-5019] - ExternalSortBatch spills all batches to disk even if even one spills
  • [DRILL-5020] - ExternalSortBatch has inconsistent notions of the memory limit
  • [DRILL-5022] - ExternalSortBatch sets two different limits for "copier" memory
  • [DRILL-5023] - ExternalSortBatch does not spill fully, throws off spill calculations
  • [DRILL-5025] - ExternalSortBatch provides weak control over spill file size
  • [DRILL-5026] - ExternalSortBatch uses two memory allocators; one will do
  • [DRILL-5027] - ExternalSortBatch is inefficient: rewrites data unnecessarily
  • [DRILL-5055] - External Sort does not delete spill file if error occurs during close
  • [DRILL-5062] - External sort refers to the deprecated HDFS fs.default.name param
  • [DRILL-5066] - External sort attempts to retry sv2 memory alloc, even if can never succeed
  • [DRILL-5210] - External Sort BatchGroup leaks memory if an OOM occurs during read
  • [DRILL-5262] - NPE in managed external sort while spilling to disk
  • [DRILL-5264] - Managed External Sort fails with OOM
  • [DRILL-5267] - Managed external sort spills too often with Parquet data
  • [DRILL-5285] - Provide detailed, accurate estimate of size consumed by a record batch
  • [DRILL-5294] - Managed External Sort throws an OOM during the merge and spill phase

Bug

  • [DRILL-1808] - Large compilation unit tests fails due to high memory allocation
  • [DRILL-2293] - CTAS does not clean up when it fails
  • [DRILL-3562] - Query fails when using flatten on JSON data where some documents have an empty array
  • [DRILL-4578] - "children" missing from results of full scan over JSON data
  • [DRILL-4764] - Parquet file with INT_16, etc. logical types not supported by simple SELECT
  • [DRILL-4812] - Wildcard queries fail on Windows
  • [DRILL-4850] - TPCDS Query 33 failed in the second and 3rd runs, but succeeded in the 1st run
  • [DRILL-4872] - NPE from CTAS partitioned by a projected casted null
  • [DRILL-4919] - Fix select count(1) / count(*) on csv with header
  • [DRILL-4938] - Report UserException when constant expression reduction fails
  • [DRILL-4963] - Issues when overloading Drill native functions with dynamic UDFs
  • [DRILL-4982] - Hive Queries degrade when queries switch between different formats
  • [DRILL-4994] - Prepared statement stopped working between 1.8.0 client and < 1.8.0 server
  • [DRILL-4995] - Allow lazy init when dynamic UDF support is disabled
  • [DRILL-4996] - Parquet Date auto-correction is not working in auto-partitioned parquet files generated by drill-1.6
  • [DRILL-5005] - Potential issues with external sort info in query profile
  • [DRILL-5015] - As per documentation, when issuing a list of drillbits in the connection string, we always attempt to connect only to the first one
  • [DRILL-5032] - Drill query on hive parquet table failed with OutOfMemoryError: Java heap space
  • [DRILL-5034] - Select timestamp from hive generated parquet always return in UTC
  • [DRILL-5039] - NPE - CTAS PARTITION BY (<char-type-column>)
  • [DRILL-5040] - Interrupted CTAS should not succeed & should not create physical file on disk
  • [DRILL-5044] - Fix retry logic to handle VersionMismatchException by not deleting jars in remote UDFs area
  • [DRILL-5048] - Fix type mismatch error in case statement with null timestamp
  • [DRILL-5050] - C++ client library has symbol resolution issues when loaded by a process that already uses boost::asio
  • [DRILL-5051] - DRILL-5051: Fix incorrect result returned in nest query with offset specified
  • [DRILL-5070] - Code gen: create methods in fixed order to allow test verification
  • [DRILL-5081] - Excessive info level logging introduced in DRILL-4203
  • [DRILL-5086] - ClassCastException when filter pushdown is used with a bigint or float column and metadata caching.
  • [DRILL-5088] - Error when reading DBRef column
  • [DRILL-5091] - JDBC unit test fail on Java 8
  • [DRILL-5094] - Assure Comparator to be transitive
  • [DRILL-5097] - Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from works
  • [DRILL-5104] - Foreman sets external sort memory allocation even for a physical plan
  • [DRILL-5112] - Unit tests derived from PopUnitTestBase fail in IDE due to config errors
  • [DRILL-5113] - Upgrade Maven RAT plugin to avoid annoying XML errors
  • [DRILL-5117] - Compile error when query a json file with 1000+columns
  • [DRILL-5119] - Update MapR version to 5.2.0.40963-mapr
  • [DRILL-5121] - A memory leak is observed when exact case is not specified for a column in a filter condition
  • [DRILL-5127] - Revert the fix for DRILL-4831
  • [DRILL-5157] - Multiple Snappy versions on class path; causes unit test failures
  • [DRILL-5159] - ProjectMergeRule in Drill should operate on RelNodes with same convention trait.
  • [DRILL-5164] - Equi-join query results in CompileException when inputs have large number of columns
  • [DRILL-5167] - C++ connector does not set escape string for metadata search pattern
  • [DRILL-5190] - Display planning and queued time for a query in its profile page
  • [DRILL-5196] - Could not run a single MongoDB unit test case through command line or IDE
  • [DRILL-5207] - Improve Parquet scan pipelining
  • [DRILL-5208] - Finding path to java executable should be deterministic
  • [DRILL-5218] - Support Disabling Heartbeats in C++ Client
  • [DRILL-5224] - CTTAS: fix errors connected with system path delimiters (Windows)
  • [DRILL-5230] - Translation of millisecond duration into hours is incorrect
  • [DRILL-5238] - CTTAS: unable to resolve temporary table if workspace is indicated without schema
  • [DRILL-5242] - The UI breaks when trying to render profiles having unknown metrics
  • [DRILL-5243] - Fix TestContextFunctions.sessionIdUDFWithinSameSession unit test
  • [DRILL-5252] - A condition returns always true
  • [DRILL-5263] - Prevent left NLJoin with non scalar subqueries
  • [DRILL-5266] - Parquet Reader produces "low density" record batches - bits vs. bytes
  • [DRILL-5273] - CompliantTextReader exhausts 4 GB memory when reading 5000 small files
  • [DRILL-5274] - Exception thrown in Drillbit shutdown in UDF cleanup code
  • [DRILL-5275] - Sort spill serialization is slow due to repeated buffer allocations
  • [DRILL-5284] - Roll-up of final fixes for managed sort
  • [DRILL-5287] - Provide option to skip updates of ephemeral state changes in Zookeeper
  • [DRILL-5293] - Poor performance of Hash Table due to same hash value as distribution below
  • [DRILL-5304] - Queries fail intermittently when there is skew in data distribution
  • [DRILL-5313] - C++ client build failure on linux
  • [DRILL-5326] - Unit tests failures related to the SERVER_METADTA

Improvement

  • [DRILL-4217] - Query parquet file treat INT_16 & INT_8 as INT32
  • [DRILL-4280] - Kerberos Authentication
  • [DRILL-4373] - Drill and Hive have incompatible timestamp representations in parquet
  • [DRILL-4604] - Generate warning on Web UI if drillbits version mismatch is detected
  • [DRILL-4864] - Add ANSI format for date/time functions
  • [DRILL-4956] - Temporary tables support
  • [DRILL-4980] - Upgrading of the approach of parquet date correctness status detection
  • [DRILL-4987] - Use ImpersonationUtil in RemoteFunctionRegistry
  • [DRILL-5043] - Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
  • [DRILL-5052] - Option to debug generated Java code using an IDE
  • [DRILL-5056] - UserException does not write full message to log
  • [DRILL-5065] - Optimize count(*) queries on MapR-DB JSON Tables
  • [DRILL-5080] - Create a memory-managed version of the External Sort operator
  • [DRILL-5085] - Add / update description for dynamic UDFs directories in drill-env.sh and drill-module.conf
  • [DRILL-5098] - Improving fault tolerance for connection between client and foreman node.
  • [DRILL-5108] - Reduce output from Maven git-commit-id-plugin
  • [DRILL-5116] - Enable generated code debugging in each Drill operator
  • [DRILL-5123] - Write query profile after sending final response to client to improve latency
  • [DRILL-5126] - Provide simplified, unified "cluster fixture" for tests
  • [DRILL-5172] - Display elapsed time for queries in the UI
  • [DRILL-5195] - Publish Operator and MajorFragment Stats in Profile page
  • [DRILL-5215] - CTTAS: disallow temp tables in view expansion logic
  • [DRILL-5221] - cancel message is delayed until queryid or data is received
  • [DRILL-5254] - Enhance default reduction factors in optimizer
  • [DRILL-5255] - Unit tests fail due to CTTAS temporary name space checks
  • [DRILL-5257] - Provide option to save query profiles sync, async or not at all
  • [DRILL-5258] - Allow "extended" mock tables access from SQL queries
  • [DRILL-5259] - Allow listing a user-defined number of profiles
  • [DRILL-5260] - Refinements to new "Cluster Fixture" test framework
  • [DRILL-5290] - Provide an option to build operator table once for built-in static functions and reuse it across queries.
  • [DRILL-5301] - Add server metadata API

New Feature

  • [DRILL-4935] - Allow drillbits to advertise a configurable host address to Zookeeper
  • [DRILL-4979] - Make dataport configurable

Task