Apache Drill 1.14.0 Release Notes
Release date: August 05, 2018
Today, we’re happy to announce the availability of Drill 1.14.0. You can download it here.
New Features and Improvements
This release of Drill provides the following new features and improvements:
- Ability to run Drill in a Docker container. (DRILL-6346)
- Ability to export and save your storage plugin configurations to a JSON file for reuse. (DRILL-4580)
- Ability to manage storage plugin configurations in the Drill configuration file, storage-plugins-override.conf. (DRILL-6494)
- Functions that return data type information. (DRILL-6361)
- The Drill kafka storage plugin supports filter pushdown for query conditions on certain Kafka metadata fields in messages. (DRILL-5977)
- Spill to disk for the Hash Join operator. (DRILL-6027)
- The dfs storage plugin supports a Logfile plugin extension that enables Drill to directly read and query log files of any format. (DRILL-6104)
- Phonetic and string distance functions. (DRILL-6519)
- The store.hive.conf.properties option enables you to specify Hive properties at the session level using the SET command. (DRILL-6575)
- Drill can directly manage the CPU resources through the Drill start-up script, drill-env.sh; you no longer have to manually add the PID to the cgroup.procs file each time a Drillbit restarts. (DRILL-143)
- Drill can query the metadata in various image formats with the image metadata format plugin. (DRILL-4364)
- Enhanced decimal data type support. (DRILL-6094)
- Option to push LIMIT(0) on top of SCAN. (DRILL-6574)
- Parquet filter pushdown improvements: - Drill can infer filter conditions for join queries and push the filter conditions down to the data source. (DRILL-6173) - Drill uses a native reader to read Hive tables when you enable the store.hive.optimize_scan_with_native_readers option. When enabled, Drill reads data faster and applies filter pushdown optimizations. (DRILL-6331)
- Early release of lateral join. (DRILL-5999)
Note: New MapR Drill ODBC and JDBC drivers are available for Drill 1.14. Earlier versions of the drivers do not work with Drill 1.14.
The following sections list all the fixes and improvements in this release:
Sub-task
- [DRILL-5030] - Drill SSL Docs have Bad Link to Oracle Website
- [DRILL-5847] - Flat Parquet Reader Performance Analysis
- [DRILL-5848] - Implement Parquet Columnar Processing & Use Bulk APIs for processing
- [DRILL-6281] - Refactor TimedRunnable
Bug
- [DRILL-3539] - CTAS over empty json file throws NPE
- [DRILL-3964] - CTAS fails with NPE when source JSON file is empty
- [DRILL-4020] - The not-equal operator returns incorrect results when used on the HBase row key
- [DRILL-4337] - Drill fails to read INT96 fields from hive generated parquet files
- [DRILL-4742] - Using convert_from timestamp_impala gives a random error
- [DRILL-4834] - decimal implementation is vulnerable to overflow errors, and extremely complex
- [DRILL-5188] - TPC-DS query16 fails - IllegalArgumentException: Target must be less than target count
- [DRILL-5201] - Query bug: null values in result of a conditioned query
- [DRILL-5281] - JdbcSchema throws exception when detecting nullable for columns
- [DRILL-5495] - convert_from function on top of int96 data results in ArrayIndexOutOfBoundsException
- [DRILL-5927] - Root allocator consistently Leaks a buffer in unit tests
- [DRILL-5990] - Apache Drill /status API returns OK ('Running') even with JRE while queries will not work - make status API reflect the fact that Drill is broken on JRE or stop Drill starting up with JRE
- [DRILL-6008] - Unable to shutdown Drillbit using short domain name
- [DRILL-6009] - No drillbits on index page
- [DRILL-6010] - Working drillbit showing as in QUIESCENT state
- [DRILL-6016] - Error reading INT96 created by Apache Spark
- [DRILL-6082] - RpcExceptionHandler log doesn't print "cause" for exception
- [DRILL-6103] - lsb_release: command not found
- [DRILL-6125] - PartitionSenderRootExec can leak memory because close method is not synchronized
- [DRILL-6132] - HashPartitionSender leaks memory
- [DRILL-6182] - Doc bug on parameter 'drill.exec.spill.fs'
- [DRILL-6199] - Filter push down doesn't work with more than one nested subqueries
- [DRILL-6202] - Deprecate usage of IndexOutOfBoundsException to re-alloc vectors
- [DRILL-6203] - Repeated Map Vector does not give correct payload bytecount.
- [DRILL-6212] - A simple join is recursing too deep in planning and eventually throwing stack overflow.
- [DRILL-6224] - The metrics' page has gauges reset to near zero values and does not seem to update
- [DRILL-6241] - Saffron properties config has the excessive permissions
- [DRILL-6242] - Output format for nested date, time, timestamp values in an object hierarchy
- [DRILL-6250] - Sqlline start command with password appears in the sqlline.log
- [DRILL-6252] - Foreman node is going down when the non foreman node is stopped
- [DRILL-6254] - IllegalArgumentException: the requested size must be non-negative
- [DRILL-6255] - Drillbit while sending control message to itself creates a connection instead of submitting locally
- [DRILL-6256] - Remove references to java 7 from readme and other files
- [DRILL-6262] - IndexOutOfBoundException in RecordBatchSize for empty variableWidthVector
- [DRILL-6274] - MergeJoin Memory Manager is still using Fragmentation Factor
- [DRILL-6275] - drillbit direct_current memory usage is not populated/updated
- [DRILL-6277] - Query fails with DATA_READ ERROR when correlated subquery has "always false" filter
- [DRILL-6278] - DRILL-5993 Made Debugging Generated Code Harder
- [DRILL-6282] - Update Drill's Metrics dependencies
- [DRILL-6283] - WebServer stores SPNEGO client principal without taking any conversion rule
- [DRILL-6286] - Regression: incorrect reference to shutdown in drillbit.log
- [DRILL-6287] - apache-release profile should be disabled by default
- [DRILL-6295] - PartitionerDecorator may close partitioners while CustomRunnable are active during query cancellation
- [DRILL-6298] - Add debug log for merge join batch sizing
- [DRILL-6299] - Parquet query returns unexpected results
- [DRILL-6302] - NPE in Drillbit.java in close method
- [DRILL-6307] - Handle empty batches in record batch sizer correctly
- [DRILL-6311] - No logging information in drillbit.log / drillbit.out
- [DRILL-6318] - Push down limit past flatten is incorrect
- [DRILL-6338] - License headers are not added to generated proto buf files with new license changes.
- [DRILL-6341] - Mongo Tests Fail on OSX 10.13.4
- [DRILL-6342] - Parquet filter pushdown doesn't work in case of filtering fields inside arrays of complex fields
- [DRILL-6343] - bit vector copyFromSafe is not doing realloc
- [DRILL-6351] - Drill fails with NullPointerException when starting in embedded mode
- [DRILL-6364] - WebUI does not cleanly handle shutdown and state toggling when Drillbits go on and offline
- [DRILL-6374] - Transitive Closure leads to TPCH Queries regressions and OOM when run concurrency test
- [DRILL-6380] - Mongo db storage plugin tests can hang on jenkins.
- [DRILL-6387] - TestTpchDistributedConcurrent tests are ignored, they should be enabled.
- [DRILL-6393] - Radians should take an argument (x)
- [DRILL-6401] - Precision for decimal data types may be lost for the case when cast with literal is used
- [DRILL-6402] - Repeated Value Vectors copyFrom methods are not updating the value count and writer index correctly for values vector
- [DRILL-6411] - Make batch memory sizing logs uniform across all operators
- [DRILL-6413] - Specific query returns an exception if filter a boolean column by "equals" operator
- [DRILL-6415] - Unit test TestGracefulShutdown.testRestApiShutdown times out
- [DRILL-6416] - Unit test TestTpchDistributedConcurrent.testConcurrentQueries fails with AssertionError
- [DRILL-6431] - Unnest operator requires table and a single column alias to be specified.
- [DRILL-6435] - MappingSet is stateful, so it can't be shared between threads
- [DRILL-6437] - Travis Fails Because Logs Are Flooded.
- [DRILL-6442] - Adjust Hbase disk cost & row count estimation when filter push down is applied
- [DRILL-6443] - Search feature for profiles is available only for running OR completed queries, but not both
- [DRILL-6447] - Unsupported Operation when reading parquet data
- [DRILL-6450] - Visualized plans for profiles querying JDBC sources is broken
- [DRILL-6455] - JDBC Scan Operator does not appear in profile
- [DRILL-6456] - Planner shouldn't create any exchanges on the right side of Lateral Join.
- [DRILL-6459] - Unable to view profile of a running query
- [DRILL-6463] - ProfileParser cannot parse costs when using MockScanBatch
- [DRILL-6467] - Percentage usage of memory is reported as zero by the WebUI
- [DRILL-6468] - OOMs trigger graceful shutdown when terminating Drill. This can cause a hang.
- [DRILL-6470] - http://repo.dremio.com/release/ can not open
- [DRILL-6471] - Different result for CAST String and Decimal literals as Decimal
- [DRILL-6472] - Drill allows to use decimal zero precision in CAST function for CTAS
- [DRILL-6474] - Queries with ORDER BY and OFFSET (w/o LIMIT) do not return any rows
- [DRILL-6475] - Unnest: Null fieldId Pointer
- [DRILL-6476] - Generate explain plan which shows relation between Lateral and the corresponding Unnest.
- [DRILL-6477] - Drillbit hangs/crashes with OOME Java Heap Space for a large query through WebUI
- [DRILL-6478] - enhance debug logs for batch sizing
- [DRILL-6485] - Typo in drill-env.sh
- [DRILL-6486] - BitVector split and transfer does not work correctly for non byte-multiple transfer lengths
- [DRILL-6487] - Negative row count when selecting from a json file with an OFFSET clause
- [DRILL-6488] - Drill native client - compile error due to usage of "template inline"
- [DRILL-6489] - Fix filter push down for Hbase & Mapr-DB binary tables when convert function is used in a view
- [DRILL-6491] - Prevent merge join for full outer join at planning stage
- [DRILL-6496] - VectorUtil.showVectorAccessibleContent does not log vector content
- [DRILL-6499] - No need to calculate stdRowWidth for every batch by default
- [DRILL-6512] - Remove unnecessary processing overhead from RecordBatchSizer
- [DRILL-6513] - Drill should only allow valid values when users set planner.memory.max_query_memory_per_node
- [DRILL-6523] - Fix NPE for describe of partial schema
- [DRILL-6529] - Project Batch Sizing causes two LargeFileCompilation tests to timeout
- [DRILL-6530] - JVM crash with a query involving multiple json files with one file having a schema change of one column from string to list
- [DRILL-6535] - ClassCastException in Lateral Unnest queries when dealing with schema changed json data
- [DRILL-6537] - Limit the batch size for buffering operators based on how much memory they get
- [DRILL-6539] - Record count not set for this vector container error
- [DRILL-6542] - IndexOutOfBoundsException for multilevel lateral queries with schema changed partitioned complex data
- [DRILL-6546] - Allow unnest function with nested columns and complex expressions
- [DRILL-6548] - IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
- [DRILL-6551] - Concat function results in SYSTEM ERROR: IllegalStateException: Tried to remove unmanaged buffer.
- [DRILL-6553] - Fix TopN for unnest operator
- [DRILL-6557] - Use size in bytes during Hive statistics calculation if present
- [DRILL-6568] - Jenkins Regression: TPCDS query 68 fails with IllegalStateException: Unexpected EMIT outcome received in buildSchema phase
- [DRILL-6570] - IndexOutOfBoundsException when using Flat Parquet Reader
- [DRILL-6576] - Unnest reports incoming record counts incorrectly
- [DRILL-6578] - Ensure the Flat Parquet Reader can handle query cancellation
- [DRILL-6583] - Add space between pagination links in Profiles (WebUI) list
- [DRILL-6588] - System table columns incorrectly marked as non-nullable
- [DRILL-6591] - When query fails on Web UI, result page does not show any error
- [DRILL-6592] - Unnest perf improvements - record batch sizer is called too frequently
- [DRILL-6594] - Data batches for Project operator are not being split properly and exceed the maximum specified
- [DRILL-6596] - Variable length vectors use unnecessary emptyByteArray to fill empties
- [DRILL-6603] - Filter pushdown for a null value eliminates all except one rowgroup
- [DRILL-6606] - Hash Join returns incorrect data types when joining subqueries with limit 0
- [DRILL-6612] - Query fails with AssertionError when joining persistent and temporary tables
- [DRILL-6614] - Allow usage of MapRDBFormatPlugin for HiveStoragePlugin
- [DRILL-6622] - UNION on tpcds sf100 tables hit SYSTEM ERROR: NullPointerException
- [DRILL-6624] - Fix loss of the table row type when the same schema name was specified as single path and as a complex path in the same query
- [DRILL-6626] - Hash Aggregate: Index out of bounds with small output batch size and spilling
- [DRILL-6627] - Adding REGEX_SUB_SCAN operator to protobuf file
- [DRILL-6632] - drill-jdbc-all jar size limit too small for release build
- [DRILL-6637] - Root pom: Release build needs to remove dep to tests in maven-javadoc-plugin
- [DRILL-6651] - Compilation error in IDE due to missing package name
New Feature
- [DRILL-143] - Support CGROUPs resource management
- [DRILL-4276] - Need a way to check on status of drillbits
- [DRILL-4364] - Image Metadata Format Plugin
- [DRILL-5261] - Expose REST endpoint in zookeeper
- [DRILL-6027] - Implement spill to disk for the Hash Join
- [DRILL-6375] - ANY_VALUE aggregate function
- [DRILL-6423] - Export query result as a CSV file
- [DRILL-6432] - Allow to print the visualized query plan only
- [DRILL-6454] - Native MapR DB plugin support for Hive MapR-DB json table
- [DRILL-6494] - Drill Plugins Handler
Improvement
- [DRILL-2746] - Filter is not pushed into subquery past UNION ALL
- [DRILL-3130] - Project can be pushed below union all / union to improve performance
- [DRILL-3855] - Enable FilterSetOpTransposeRule, DrillProjectSetOpTransposeRule
- [DRILL-4091] - Support more functions in gis contrib module
- [DRILL-4525] - Query with BETWEEN clause on Date and Timestamp values fails with Validation Error
- [DRILL-4580] - Provide options to import and export storage plugin configurations
- [DRILL-4829] - Configure the address to bind to
- [DRILL-5305] - Query Profile must display Query ID
- [DRILL-5584] - When Compiling Apache Drill C++ Client, versioning information are not present in the binary
- [DRILL-5700] - nohup support for sqlline
- [DRILL-5797] - Use more often the new parquet reader
- [DRILL-5846] - Improve Parquet Reader Performance for Flat Data types
- [DRILL-5924] - native-client: Support user-specified CXX_FLAGS
- [DRILL-5977] - predicate pushdown support kafkaMsgOffset
- [DRILL-6005] - Fix TestGracefulShutdown tests to skip check for loopback address usage in distributed mode
- [DRILL-6053] - Avoid excessive locking in LocalPersistentStore
- [DRILL-6094] - Decimal data type enhancements
- [DRILL-6104] - Generic Logfile Format Plugin
- [DRILL-6145] - Enable usage of Hive MapR-DB JSON handler
- [DRILL-6147] - Limit batch size for Flat Parquet Reader
- [DRILL-6161] - Allocate memory for outgoing vectors based on sizing calculations
- [DRILL-6162] - Enhance record batch sizer to retain nesting information for map columns.
- [DRILL-6173] - Support transitive closure during filter push down and partition pruning
- [DRILL-6230] - Extend row set readers to handle hyper vectors
- [DRILL-6231] - Fix memory allocation for repeated list vector
- [DRILL-6234] - Improve Documentation of VariableWidthVector Behavior
- [DRILL-6236] - batch sizing for hash join
- [DRILL-6239] - Add Build and License Badges to README.md
- [DRILL-6243] - Alert box to confirm shutdown of drillbit after clicking shutdown button
- [DRILL-6248] - Support pushdown into System Table
- [DRILL-6249] - Add Markdown Docs for Unit Testing and Link to it in README.md
- [DRILL-6259] - Support parquet filter push down for complex types
- [DRILL-6279] - Web UI should indicate when operators have spilled in-memory data to disk
- [DRILL-6284] - Add operator metrics for batch sizing for flatten
- [DRILL-6289] - Cluster view should show more relevant information
- [DRILL-6296] - Add operator metrics for batch sizing for merge join
- [DRILL-6303] - Provide a button to copy the Drillbit's JStack shown in /threads
- [DRILL-6310] - limit batch size for hash aggregate
- [DRILL-6320] - Don't Allow Javadoc comments for license headers
- [DRILL-6331] - Parquet filter pushdown does not support the native hive reader
- [DRILL-6333] - Generate and host Javadocs on Apache Drill website
- [DRILL-6334] - Code cleanup
- [DRILL-6335] - Refactor row set abstractions to prepare for unions
- [DRILL-6339] - New option to disable TopN (for testing Sort)
- [DRILL-6340] - Output Batch Control in Project using the RecordBatchSizer
- [DRILL-6345] - Add LOG10 function implementation
- [DRILL-6346] - Create an Official Drill Docker Container
- [DRILL-6347] - Inconsistent method name "field".
- [DRILL-6348] - Unordered Receiver does not report its memory usage
- [DRILL-6356] - batch sizing for union all
- [DRILL-6361] - Provide sqlTypeOf() and modeOf() functions
- [DRILL-6389] - Fix building javadocs
- [DRILL-6418] - Handle Schema change in Unnest And Lateral for unnest field / non-unnest field
- [DRILL-6424] - Updating FasterXML Jackson libraries
- [DRILL-6436] - Store context and name in AbstractStoragePlugin instead of replicating fields in each StoragePlugin
- [DRILL-6438] - Remove excess logging from tests
- [DRILL-6440] - Fix ignored unit tests in unnest
- [DRILL-6466] - Add HttpOnly flag for response cookie
- [DRILL-6479] - Support for EMIT outcome in Hash Aggregate
- [DRILL-6502] - Rename CorrelatePrel to LateralJoinPrel
- [DRILL-6503] - Performance improvements in lateral
- [DRILL-6505] - Drill Web UI query: support back button or add "edit query"
- [DRILL-6515] - Render a linkage between the Unnest operator and its source operator
- [DRILL-6516] - Support for EMIT outcome in streaming agg
- [DRILL-6519] - Add String Distance and Phonetic Functions
- [DRILL-6545] - Projection Push down into Lateral Join operator.
- [DRILL-6549] - batch sizing for nested loop join
- [DRILL-6554] - Minor code improvements in parquet statistics handling
- [DRILL-6560] - Allow options for controlling the batch size per operator
- [DRILL-6561] - Lateral excluding the columns from output container provided by projection push into rules
- [DRILL-6574] - Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
- [DRILL-6575] - Add store.hive.conf.properties option to allow set Hive properties at session level
- [DRILL-6577] - Change Hash-Join default to not fallback (into pre-1.14 unlimited memory)
- [DRILL-6579] - Add sanity checks to Parquet Reader
- [DRILL-6581] - Improve C++ Client SSL Implementation
- [DRILL-6586] - Add SSL Hostname verification with zookeeper connection mode support
- [DRILL-6587] - Add support for custom SSL CTX Options
- [DRILL-6601] - LageFileCompilation testProject times out
- [DRILL-6650] - Remove Stray Semicolon in Printing Results Listener
Task
- [DRILL-5937] - Make prepare.statement.create_timeout_ms default to 30 seconds instead of 10 seconds
- [DRILL-6061] - Doc Request: Global Query List showing queries from all Drill foreman nodes
- [DRILL-6237] - Upgrade checkstyle version to 5.9 or above
- [DRILL-6270] - Add debug startup option flag for drill in embedded and server mode
- [DRILL-6271] - Update copyright range in NOTICE
- [DRILL-6272] - Remove binary jars files from source distribution
- [DRILL-6273] - Remove dependency licensed under Category X
- [DRILL-6288] - Upgrade org.javassist:javassist and org.reflections:reflections
- [DRILL-6290] - Refactor TestInfoSchemaFilterPushDown tests to use PlanTestBase utility methods
- [DRILL-6294] - Update Calcite version to 1.16.0
- [DRILL-6300] - Refresh protobuf C++ source files
- [DRILL-6301] - Parquet Performance Analysis
- [DRILL-6321] - Lateral Join: Planning changes - enable submitting physical plan
- [DRILL-6323] - Lateral Join - Initial implementation
- [DRILL-6328] - Consolidate developer docs in docs/ folder of drill repo.
- [DRILL-6353] - Upgrade Parquet MR dependencies
- [DRILL-6363] - Upgrade jmockit and mockito libs
- [DRILL-6419] - E2E Integration test for Lateral&Unnest
- [DRILL-6420] - Add Lateral and Unnest Keyword for highlighting on WebUI
- [DRILL-6421] - Refactor DecimalUtility and CoreDecimalUtility classes
- [DRILL-6426] - Refactor TestFunctionsWithTypeExpoQueries test to be independent on limit 0 optimization option
- [DRILL-6445] - Fix existing test cases in TestScripts.java and add new test case for DRILLBIT_CONTEXT variable
- [DRILL-6446] - Support for EMIT outcome in TopN
- [DRILL-6481] - Refactor ParquetXXXPredicate classes
- [DRILL-6498] - Support for EMIT outcome in ExternalSortBatch
- [DRILL-6500] - Update Apache Drill security documentation
- [DRILL-6526] - Refactor FileSystemConfig to disallow direct access from the code to its variables
- [DRILL-6531] - Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
- [DRILL-6534] - Upgrade ZooKeeper patch version to 3.4.12 and add Apache Curator to dependencyManagement
- [DRILL-6559] - Travis timing out