Apache Drill 1.11.0 Release Notes

Release date: July 31, 2017

Today, we’re happy to announce the availability of Drill 1.11.0. You can download it here.

New Features and Improvements

This release of Drill provides the following new features and improvements:

  • Cryptography-related functions. (DRILL-5634)
  • Spill to disk for the hash aggregate operator. (DRILL-5457)
  • Format plugin support for PCAP files. (DRILL-5432)
  • Ability to change the HDFS block Size for Parquet files. (DRILL-5379)
  • Ability to store query profiles in memory. (DRILL-5481)
  • Configurable CTAS directory and file permissions option. (DRILL-5391)
  • Support for network encryption. (DRILL-4335)
  • Relative paths stored in the metadata file. (DRILL-3867)
  • Support for ANSI_QUOTES. (DRILL-3510)

The following sections list additional bug fixes and improvements:

Sub-task

  • [DRILL-3250] - Drill fails to compare multi-byte characters from hive table
  • [DRILL-4301] - OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
  • [DRILL-5008] - Refactor, document and simplify ExternalSortBatch
  • [DRILL-5011] - External Sort Batch memory use depends on record width
  • [DRILL-5014] - ExternalSortBatch cache size, spill count differs from config setting
  • [DRILL-5019] - ExternalSortBatch spills all batches to disk even if even one spills
  • [DRILL-5020] - ExternalSortBatch has inconsistent notions of the memory limit
  • [DRILL-5022] - ExternalSortBatch sets two different limits for "copier" memory
  • [DRILL-5023] - ExternalSortBatch does not spill fully, throws off spill calculations
  • [DRILL-5025] - ExternalSortBatch provides weak control over spill file size
  • [DRILL-5026] - ExternalSortBatch uses two memory allocators; one will do
  • [DRILL-5027] - ExternalSortBatch is inefficient: rewrites data unnecessarily
  • [DRILL-5055] - External Sort does not delete spill file if error occurs during close
  • [DRILL-5062] - External sort refers to the deprecated HDFS fs.default.name param
  • [DRILL-5066] - External sort attempts to retry sv2 memory alloc, even if can never succeed
  • [DRILL-5210] - External Sort BatchGroup leaks memory if an OOM occurs during read
  • [DRILL-5285] - Provide detailed, accurate estimate of size consumed by a record batch
  • [DRILL-5312] - "Record batch sizer" does not include overhead for variable-sized vectors
  • [DRILL-5319] - Refactor FragmentContext and OptionManager for unit testing
  • [DRILL-5320] - Refactor OptionManager for unit testing
  • [DRILL-5321] - Refactor FragmentContext for unit testing
  • [DRILL-5322] - Provide an OperatorFixture for sub-operator unit testing setup
  • [DRILL-5323] - Provide test tools to create, populate and compare row sets
  • [DRILL-5324] - Provide simplified column reader/writer for use in tests
  • [DRILL-5331] - NPE in FunctionImplementationRegistry.findDrillFunction() if dynamic UDFs disabled
  • [DRILL-5342] - Refactor "managed" external sort for unit tests
  • [DRILL-5567] - Review changes for DRILL 5514

Bug

  • [DRILL-3867] - Store relative paths in metadata file
  • [DRILL-4039] - Query fails when non-ascii characters are used in string literals
  • [DRILL-4347] - Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release
  • [DRILL-4511] - refresh over empty folder results in error, we need a better error message
  • [DRILL-4678] - Tune metadata by generating a dispatcher at runtime
  • [DRILL-4720] - MINDIR() and IMINDIR() functions return no results with metadata cache
  • [DRILL-4722] - Fix EqualityVisitor for interval day expressions with millis
  • [DRILL-4755] - StringIndexOutOfBoundsException seen with CONVERT_FROM function
  • [DRILL-4903] - Implicit columns are shown when Jdbc plugin is enabled
  • [DRILL-4970] - Wrong results when casting double to bigint or int
  • [DRILL-4971] - Query encounters system error, when there aren't eval subexpressions of any function in boolean and/or expressions
  • [DRILL-5005] - Potential issues with external sort info in query profile
  • [DRILL-5083] - RecordIterator can sometimes restart a query on close
  • [DRILL-5130] - UNION ALL difference in results
  • [DRILL-5140] - Fix CompileException in run-time generated code when record batch has large number of fields.
  • [DRILL-5160] - Memory leak in Parquet async reader when Snappy fails
  • [DRILL-5164] - Equi-join query results in CompileException when inputs have large number of columns
  • [DRILL-5165] - wrong results - LIMIT ALL and OFFSET clause in same query
  • [DRILL-5213] - Prepared statement for actual query is missing the query text
  • [DRILL-5226] - External Sort encountered an error while spilling to disk
  • [DRILL-5229] - Upgrade kudu client to org.apache.kudu:kudu-client:1.2.0
  • [DRILL-5234] - External sort's spilling functionality does not work when the spilled columns contains a map type column
  • [DRILL-5284] - Roll-up of final fixes for managed sort
  • [DRILL-5297] - Print the plan text when plan pattern check fails in unit tests
  • [DRILL-5311] - C++ connector connect doesn't check handshake result for timeout
  • [DRILL-5316] - C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
  • [DRILL-5330] - NPE in FunctionImplementationRegistry.functionReplacement()
  • [DRILL-5344] - External sort priority queue copier fails with an empty batch
  • [DRILL-5349] - TestParquetWriter unit tests fail with synchronous parquet reader
  • [DRILL-5359] - ClassCastException when push down filter on the output of flatten into parquet scan
  • [DRILL-5368] - Memory leak in C++ server metadata handler
  • [DRILL-5369] - Missing initialization for ServerMetaContext
  • [DRILL-5373] - Drill JDBC error in the process of connection via SQuirrel: java.lang.NoClassDefFoundError: javax/validation/constraints/NotNull
  • [DRILL-5375] - Nested loop join: return correct result for left join
  • [DRILL-5378] - Put more information into SchemaChangeException when HashJoin hit SchemaChangeException
  • [DRILL-5385] - Vector serializer fails to read saved SV2
  • [DRILL-5387] - TestBitBitKerberos and TestUserBitKerberos cause sporadic unit test failures
  • [DRILL-5395] - Query on MapR-DB table fails with NPE due to an issue with assignment logic
  • [DRILL-5397] - Random Error : Unable to get holder type for minor type [LATE] and mode [OPTIONAL]
  • [DRILL-5399] - Fix race condition in DrillComplexWriterFuncHolder
  • [DRILL-5409] - Update MapR version to 5.2.1-mapr
  • [DRILL-5413] - DrillConnectionImpl.isReadOnly() throws NullPointerException
  • [DRILL-5419] - Calculate return string length for literals & some string functions
  • [DRILL-5420] - all cores at 100% of all servers
  • [DRILL-5424] - Fix IOBE for reverse function
  • [DRILL-5428] - submit_plan fails after Drill 1.8 script revisions
  • [DRILL-5429] - Improve query performance for MapR DB JSON Tables
  • [DRILL-5450] - Fix initcap function to convert upper case characters correctly
  • [DRILL-5496] - Must restart drillbits whenever a secure Hive metastore is restarted
  • [DRILL-5498] - CSV text reader does not handle duplicate header names
  • [DRILL-5523] - Revert if condition in UnionAllRecordBatch changed in DRILL-5419
  • [DRILL-5533] - Fix flag assignment in FunctionInitializer.checkInit() method
  • [DRILL-5537] - Display columns alias for queries with sum() when RDBMS storage plugin is enabled
  • [DRILL-5538] - Create TopProject with validatedNodeType after PHYSICAL phase
  • [DRILL-5541] - C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV
  • [DRILL-5544] - Out of heap running CTAS against text delimited
  • [DRILL-5560] - Create configuration file for distribution specific configuration
  • [DRILL-5577] - Column alias are ignored when Storage Plugin is enabled
  • [DRILL-5587] - Validate Parquet blockSize and pageSize configured with SYSTEM/SESSION option
  • [DRILL-5589] - JDBC client crashes after successful authentication if trace logging is enabled.
  • [DRILL-5590] - Drill return IndexOutOfBoundsException when a (Text) file > 4096 rows
  • [DRILL-5599] - Notify StatusHandlerListener that batch sending has failed even if channel is still open
  • [DRILL-5616] - Hash Agg Spill: OOM while reading irregular varchar data
  • [DRILL-5659] - C++ Client (master) behavior is unstable resulting incorrect result or exception in API calls
  • [DRILL-5665] - planner.force_2phase.aggr Set to TRUE for HashAgg may cause wrong results for VARIANCE and STD_DEV
  • [DRILL-5668] - C++ connector crash when query error message is too long
  • [DRILL-5669] - Multiple TPCH queries failed due to OOM
  • [DRILL-5678] - Undefined behavior due to un-initialized values in ServerMetaContext

Improvement

  • [DRILL-2974] - Make OutOfMemoryException an unchecked exception and remove OutOfMemoryRuntimeException
  • [DRILL-3510] - Add ANSI_QUOTES option so that Drill's SQL Parser will recognize ANSI_SQL identifiers
  • [DRILL-5056] - UserException does not write full message to log
  • [DRILL-5080] - Create a memory-managed version of the External Sort operator
  • [DRILL-5163] - External sort on Mac creates a separate child process per spill via HDFS FS
  • [DRILL-5315] - Small Comment Typo in drillClient.hpp
  • [DRILL-5318] - Create a sub-operator test framework
  • [DRILL-5325] - Implement sub-operator unit tests for managed external sort
  • [DRILL-5351] - Excessive bounds checking in the Parquet reader
  • [DRILL-5352] - Extend test framework profile parser printer for multi-fragment queries
  • [DRILL-5355] - Misc. code cleanup
  • [DRILL-5356] - Refactor Parquet Record Reader
  • [DRILL-5379] - Set Hdfs Block Size based on Parquet Block Size
  • [DRILL-5391] - CTAS: make folder and file permission configurable
  • [DRILL-5394] - Optimize query planning for MapR-DB tables by caching row counts
  • [DRILL-5415] - Improve Fixture Builder to configure client properties and keep collection type properties for server
  • [DRILL-5423] - Refactor ScanBatch to allow unit testing record readers
  • [DRILL-5457] - Support Spill to Disk for the Hash Aggregate Operator
  • [DRILL-5481] - Allow Drill to persist profiles in-memory only with a max capacity
  • [DRILL-5485] - Remove WebServer dependency on DrillClient
  • [DRILL-5504] - Vector validator to diagnose offset vector issues
  • [DRILL-5512] - Standardize error handling in ScanBatch
  • [DRILL-5514] - Enhance VectorContainer to merge two row sets
  • [DRILL-5516] - Limit memory usage for Hbase reader
  • [DRILL-5517] - Provide size-aware set operations in value vectors
  • [DRILL-5518] - Roll-up of a number of test framework enhancements
  • [DRILL-5545] - Add findbugs to build

New Feature

  • [DRILL-291] - Add SASL support for Drill
  • [DRILL-4335] - Apache Drill should support network encryption - SASL encryption between Drill Client to Drillbit
  • [DRILL-5432] - Added pcap-format support
  • [DRILL-5634] - Add Crypto and Hash Functions