Apache Drill 1.2.0 Release Notes

Release date: October 16, 2015

Today we're happy to announce the availability of Drill 1.2.0, providing more than 150 bug fixes and a number of new features. These release notes include links to the following Jira issues:

Noteworthy New Features in Drill 1.2.0

This release of Drill introduces a number of enhancements, including the following ones:

Enhancements and Bug Fixes

Sub-task

  • [DRILL-3364] - Prune scan range if the filter is on the leading field with byte comparable encoding
  • [DRILL-3553] - add support for LEAD and LAG window functions
  • [DRILL-3608] - add support for FIRST_VALUE and LAST_VALUE
  • [DRILL-3616] - Memory leak in a cleanup code after canceling queries with window functions spilling to disk
  • [DRILL-3619] - Add support for NTILE window function

Bug

  • [DRILL-343] - Document update required to describe sqlline usage for Windows
  • [DRILL-1395] - UNION ALL query fails "while setting up Foreman"
  • [DRILL-1457] - Limit operator optimization : push limit operator past exchange operator
  • [DRILL-1651] - Allow Filter to push past Project with ITEM operator
  • [DRILL-1773] - Issues when using JAVA code through Drill JDBC driver
  • [DRILL-1795] - TPCH SF1000 Queries fail with DecoderException: java.lang.NullPointerException
  • [DRILL-1816] - Scan Error with JSON on large no of records with Complex Types
  • [DRILL-1831] - LIKE operator not working with SQL [charlist] Wildcard
  • [DRILL-1929] - After canceling query in sqlline subsequent query in same session hangs
  • [DRILL-1938] - Fix error message when reserved words are used in query.
  • [DRILL-1976] - Possible Memory Leak in drill jdbc client when dealing with wide columns (5000 chars long)
  • [DRILL-2050] - Accountor closed with outstanding buffer
  • [DRILL-2053] - Column names are case sensitive if column is coming from WITH clause
  • [DRILL-2095] - Order by on a repeated index inside a sub query results in an NPE
  • [DRILL-2098] - correct startup on windows
  • [DRILL-2166] - left join with complex type throw ClassTransformationException
  • [DRILL-2190] - Failure to order by function if DISTINCT clause is present
  • [DRILL-2274] - Unable to allocate sv2 buffer after repeated attempts : JOIN, Order by used in query
  • [DRILL-2304] - Case sensitivity - system and session options are case sensitive
  • [DRILL-2312] - JDBC driver returning incorrect data after extended usage
  • [DRILL-2313] - Query fails when one of the operands is a DATE literal without an explicit cast
  • [DRILL-2318] - Query fails when an ORDER BY clause is used with WITH-CLAUSE
  • [DRILL-2361] - Column aliases cannot include dots
  • [DRILL-2398] - IS NOT DISTINCT FROM predicate returns incorrect result when used as a join filter
  • [DRILL-2418] - Memory leak during execution if comparison function is not found
  • [DRILL-2445] - JDBC : Connection.rollback method currently throws UnsuportedOperationException
  • [DRILL-2451] - JDBC : Connection.commit throws an UnsupportedOperationException
  • [DRILL-2459] - INFO._SCHEMA's CHARACTER_MAXIMUM_LENGTH is -1 for type CHAR
  • [DRILL-2482] - JDBC : calling getObject when the actual column type is 'NVARCHAR' results in NoClassDefFoundError
  • [DRILL-2519] - INFORMATION_SCHEMA.COLUMNS is missing <interval_qualifier> info
  • [DRILL-2522] - Implement INFORMATION_SCHEMA.* enough for relevant tools [bug]
  • [DRILL-2530] - getColumns() doesn't return right COLUMN_SIZE for INTERVAL types
  • [DRILL-2588] - Profile UI: "First Start" field contains incorrect data
  • [DRILL-2625] - org.apache.drill.common.StackTrace should follow standard stacktrace format
  • [DRILL-2643] - HashAggBatch/HashAggTemplate call incoming.cleanup() twice resulting in warnings
  • [DRILL-2644] - Data Types page should list and describe all data types
  • [DRILL-2649] - Math and Trig page seems to refer to types that are not Drill SQL types
  • [DRILL-2650] - Cancelled queries json profile shows query end time occurs before fragments end time
  • [DRILL-2688] - Use of ORDER BY on right side of Union All results in SqlValidatorException
  • [DRILL-2721] - Identify, fix _existing_ INFO._SCHEMA columns in conflict with SQL spec. by 1.0
  • [DRILL-2722] - Query profile data not being sent/received (and web UI not updated)
  • [DRILL-2724] - Implicit cast test fails in Union All query (reports type mismatch)
  • [DRILL-2727] - CTAS select * from CSV file results in Exception
  • [DRILL-2737] - Sqlline throws Runtime exception when JDBC ResultSet throws a SQLException
  • [DRILL-2745] - Query returns IOB Exception when JSON data with empty arrays is input to flatten function
  • [DRILL-2760] - Quoted strings from CSV file appear in query output in different forms
  • [DRILL-2777] - CTAS, order by and flatten of repeated list result in ExpandConversionRule error
  • [DRILL-2800] - Performance regression introduced with commit: a6df26a (Patch for DRILL-2512)
  • [DRILL-2802] - Projecting dir[n] by itself results in projecting of all columns
  • [DRILL-2815] - Some PathScanner logging, misc. cleanup.
  • [DRILL-2837] - Resolve what Statement.cancel() really should do
  • [DRILL-2843] - Reading empty CSV file fails with error (rather than yielding zero rows)
  • [DRILL-2851] - Memory LEAK - FLATTEN function fails when input array has 99,999 integer type elements
  • [DRILL-2852] - CASTing the column 'dir0' in view causes partition pruning to fail
  • [DRILL-2862] - Convert_to/Convert_From throw assertion when an incorrect encoding type is specified
  • [DRILL-2864] - Unable to cast string literal with the valid value in ISO 8601 format to interval
  • [DRILL-2867] - Session level parameter drill.exec.testing.controls appears to be set even though it was not
  • [DRILL-2890] - C++ Client: Update query submitter usage to clarify using authentication
  • [DRILL-2924] - IOBException when querying a table which has 1 file and a subfolder with 1 file
  • [DRILL-2935] - Casting varchar to varbinary fails
  • [DRILL-2937] - Result for integer values from json files contains "$numberLong" in front of value
  • [DRILL-2949] - TPC-DS queries 1 and 30 fail with CannotPlanException
  • [DRILL-3030] - Foreman hangs trying to cancel non-root fragments
  • [DRILL-3045] - Drill is not partition pruning due to internal off-heap memory limit for planning phase
  • [DRILL-3056] - Numeric literal in an IN list is casted to decimal even when decimal type is disabled
  • [DRILL-3076] - USING clause should not be supported in drill
  • [DRILL-3095] - Memory Leak : Failure while closing accountor.
  • [DRILL-3096] - "State change requested from ... --> ... for " blank after "for"
  • [DRILL-3122] - Changing a session option to default value results in status as changed
  • [DRILL-3133] - MergingRecordBatch can leak memory if query is canceled before batches in rawBatches were loaded
  • [DRILL-3141] - sqlline throws an exception when query is cancelled
  • [DRILL-3151] - ResultSetMetaData not as specified by JDBC (null/dummy value, not ""/etc.)
  • [DRILL-3153] - DatabaseMetaData.getIdentifierQuoteString() returns (standard) double-quote, not Drill's back quote
  • [DRILL-3156] - Calcite tracing is broken in Drill
  • [DRILL-3160] - Make JDBC Javadoc documentation available to users
  • [DRILL-3163] - Fix hang/ leak issue exposed by TestDrillbitResilience#foreman_runTryEnd
  • [DRILL-3189] - Disable ALLOW PARTIAL/DISALLOW PARTIAL in window function grammar
  • [DRILL-3243] - Need a better error message - Use of alias in window function definition
  • [DRILL-3257] - TPCDS query 74 results in a StackOverflowError on Scale Factor 1
  • [DRILL-3271] - Hive : Tpch 01.q fails with a verification issue for SF100 dataset
  • [DRILL-3284] - Document incompatibility between drill's to_date and hive's unix_timestamp
  • [DRILL-3287] - Changing session level parameter back to the default value does not change it's status back to DEFAULT
  • [DRILL-3292] - SUM(constant) OVER(...) returns wrong results
  • [DRILL-3297] - Using rank, dense_rank, percent_rank, cume_dist, row_number window functions without OVER clause results in cryptic schema change error
  • [DRILL-3312] - PageReader.allocatePageData() calls BufferAllocator.buffer(int) but doesn't check if the result is null
  • [DRILL-3322] - Something broken in or around RPC timeout setup?
  • [DRILL-3347] - Resolve: ResultSet.getObject(...) for VARCHAR returns ...hadoop.io.Text, not String
  • [DRILL-3348] - NPE when two different window functions are used in projection list and order by clauses
  • [DRILL-3360] - Window function defined within another window function
  • [DRILL-3382] - CTAS with order by clause fails with IOOB exception
  • [DRILL-3393] - Quotes not being recognized in tab delimited (tsv) files
  • [DRILL-3412] - Projections are not getting push down below Window operator
  • [DRILL-3441] - CompliantTextRecordReader#isStarQuery loops indefinitely
  • [DRILL-3445] - BufferAllocator.buffer() implementations should throw an OutOfMemoryRuntimeException
  • [DRILL-3448] - typo in QueryManager.DrillbitStatusListener will cause the Foreman to hang if a Drillbit dies
  • [DRILL-3455] - If a drillbit, that contains fragments for the current query, dies the QueryManager will fail the query even if those fragments already finished successfully
  • [DRILL-3463] - Unit test of project pushdown in TestUnionAll should put more precisely plan attribute in plan verification.
  • [DRILL-3464] - Index out of bounds exception while performing concat()
  • [DRILL-3476] - Filter on nested element gives wrong results
  • [DRILL-3479] - Sqlline from drill v1.1.0 displays version as 1.0.0
  • [DRILL-3483] - Clarify CommonConstants' constants.
  • [DRILL-3484] - Error using functions with no parameters when `drill.exec.functions.cast_empty_string_to_null` is set to true
  • [DRILL-3497] - Throw UserException#validationError for errors when modifying options
  • [DRILL-3500] - Provide additional information while registering storage plugin optimizer rules
  • [DRILL-3502] - JDBC driver can cause conflicts
  • [DRILL-3503] - Make PruneScanRule have a pluggable partitioning mechanism
  • [DRILL-3509] - Empty JSON files trigger exception when used in a union
  • [DRILL-3537] - Empty Json file can potentially result into wrong results
  • [DRILL-3542] - Rebase Drill on Calcite 1.4.0 release
  • [DRILL-3550] - Incorrect results reading complex data with schema change
  • [DRILL-3551] - CTAS from complex Json source with schema change is not written (and hence not read back ) correctly
  • [DRILL-3554] - Union over TIME and TIMESTAMP values throws SchemaChangeException
  • [DRILL-3555] - Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail
  • [DRILL-3557] - Reading empty CSV file fails with SYSTEM ERROR
  • [DRILL-3566] - Calling Connection.prepareStatement throws a ClassCastException
  • [DRILL-3573] - Enable TPC-H query 17 in unit tests
  • [DRILL-3574] - Empty Over Clause should trigger Union-Exchange to be added below
  • [DRILL-3579] - Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__
  • [DRILL-3580] - wrong plan for window function queries containing function(col1 + colb)
  • [DRILL-3583] - SUM on varchar column produces incorrect error
  • [DRILL-3595] - Wrong results returned by query that uses LEAD window function
  • [DRILL-3596] - Allow only (<expression>) or (<expression>, 1) for LEAD and LAG window functions as input parameters
  • [DRILL-3599] - Wrong results returned by LEAD(col-name, -1)
  • [DRILL-3601] - LEAD function used without OVER clause should not plan
  • [DRILL-3604] - LEAD(<varchar-column>) returns IOB Exception
  • [DRILL-3605] - Wrong results - Lead(char-column)
  • [DRILL-3606] - Wrong results - Lead(char-column) without PARTITION BY clause
  • [DRILL-3617] - Apply "shading" to JDBC-all Jar file to avoid version conflicts
  • [DRILL-3621] - Wrong results when Drill on Hbase query contains rowkey "or" or "IN"
  • [DRILL-3622] - With user authentication enabled, only admin users should be able to change system options
  • [DRILL-3635] - IllegalArgumentException - not a Parquet file (too small)
  • [DRILL-3638] - Incorrect results LEAD(<float-type-column>)
  • [DRILL-3642] - External Sort will leak memory if query is cancelled while it's spilling to disk
  • [DRILL-3643] - NTILE(0) returns RuntimeException
  • [DRILL-3645] - typo in drill documentation
  • [DRILL-3648] - NTILE function returns incorrect results
  • [DRILL-3649] - LEAD , LAG , NTILE , FIRST_VALUE , LAST_VALUE report RuntimeException for missing OVER clause
  • [DRILL-3653] - Assert in a query with both avg aggregate and avg window aggregate functions
  • [DRILL-3654] - FIRST_VALUE(<char-column>/<varchar-column>) returns IOB Exception
  • [DRILL-3657] - Wrong result with SUM(1) window function when multiple partitions are present
  • [DRILL-3658] - Missing org.apache.hadoop in the JDBC jar
  • [DRILL-3663] - Drill View aliases being lost via ‘order by’
  • [DRILL-3667] - Random Assertion Error while planning
  • [DRILL-3668] - Incorrect results FIRST_VALUE function
  • [DRILL-3673] - Memory leak in parquet writer on CTAS
  • [DRILL-3677] - Wrong result with LEAD window function when used in multiple windows in the same query
  • [DRILL-3679] - IOB Exception : when window functions used in outer and inner query
  • [DRILL-3680] - window function query returns Incorrect results
  • [DRILL-3684] - CTAS : Memory Leak when using CTAS with tpch sf100
  • [DRILL-3685] - Failure to execute query with NTILE and ROW_NUMBER window functions with different window definitions
  • [DRILL-3689] - incorrect results : aggregate AVG returns wrong results over results returned by LEAD function.
  • [DRILL-3690] - Partitioning pruning produces wrong results when there are nested expressions in the filter
  • [DRILL-3691] - CTAS Memory Leak : IllegalStateException
  • [DRILL-3700] - Exception in a query with multiple fist_value window functions with different partitions
  • [DRILL-3702] - PartitionPruning hit ClassCastException in Interpreter when the pruning filter expression is of non-nullable type.
  • [DRILL-3707] - Fix for DRILL-3616 can cause a NullPointerException in ExternalSort cleanup
  • [DRILL-3711] - Windows unit test failure on 1.2 snapshot
  • [DRILL-3716] - Drill should push filter past aggregate in order to improve query performance.
  • [DRILL-3718] - quotes in .tsv trigger exception
  • [DRILL-3719] - Adding negative sign in front of EXTRACT triggers Assertion Error
  • [DRILL-3732] - Drill leaks memory if external sort hits out of disk space exception
  • [DRILL-3735] - Directory pruning is not happening when number of files is larger than 64k
  • [DRILL-3736] - Documentation for partition is misleading/wrong syntax
  • [DRILL-3737] - CTAS from empty text file fails with NPE
  • [DRILL-3746] - Hive query fails if the table contains external partitions
  • [DRILL-3757] - Link to IntelliJ IDEA settings jar on the contributors guidelines page is broken.
  • [DRILL-3758] - InvalidRecordException while selecting from a table with multiple parquet file
  • [DRILL-3767] - SchemaPath.getCompoundPath(String...strings) reverses it's input array
  • [DRILL-3773] - Mongo RecordReader projection pushdown doesn't work past first level paths
  • [DRILL-3778] - Add rest of DRILL-3160 (making JDBC Javadoc available)
  • [DRILL-3779] - NPE during mergeAndSpill operation of external sort
  • [DRILL-3781] - Using CURRENT_DATE in a group by throws a column not found error for hive tables and csv files
  • [DRILL-3783] - Incorrect results : COUNT(<column-name>) over results returned by UNION ALL
  • [DRILL-3784] - simple Jdbc program fails with NoClassDefFoundError
  • [DRILL-3788] - Directory based partition pruning not taking effect with metadata caching
  • [DRILL-3809] - PrelFinalizable.SHUTTLE causes ArrayIndexOutOfBoundsException when multiple queries are run concurrently
  • [DRILL-3811] - AtomicRemainder incorrectly accounts for transferred allocations
  • [DRILL-3817] - Refresh metadata does not work when used with sub schema
  • [DRILL-3819] - Remove redundant filter for files start with "."
  • [DRILL-3884] - Hive native scan has lower parallelization leading to performance degradation
  • [DRILL-3892] - Metadata cache not being leveraged when partition pruning is taking place
  • [DRILL-3901] - Performance regression with doing Explain of COUNT(*) over 100K files
  • [DRILL-3917] - IllegalArgumentException when running query after creating metadata cache
  • [DRILL-3918] - Avoid extra loading of the metadata cache file

Improvement

  • [DRILL-1666] - Provide Test cases for Mongo Storage plugin
  • [DRILL-2332] - Drill should be consistent with Implicit casting rules across data formats
  • [DRILL-2424] - Ignore hidden files in directory path
  • [DRILL-2699] - Collect all cleanup errors before reporting a failure to the client
  • [DRILL-2748] - Filter is not pushed down into subquery with the group by
  • [DRILL-2908] - Support reading the Parquet int 96 type
  • [DRILL-3121] - Hive partition pruning is not happening
  • [DRILL-3209] - [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists
  • [DRILL-3295] - UNION (distinct type) is supported now
  • [DRILL-3341] - Move OperatorWrapper list and FragmentWrapper list creation to ProfileWrapper ctor
  • [DRILL-3354] - TestBuilder can check if the number of result batches equals some expected value
  • [DRILL-3450] - Rename NonRootStatusReporter to FragmentStatusReporter
  • [DRILL-3467] - Restrict 'show databases' based on underlying permissions.
  • [DRILL-3536] - Add support for LEAD, LAG, NTILE, FIRST_VALUE and LAST_VALUE window functions
  • [DRILL-3545] - Need documentation on BINARY_STRING and STRING_BINARY functions
  • [DRILL-3565] - Add support for Avro UNION type
  • [DRILL-3589] - JDBC driver maven artifact includes a lot of unnecessary dependencies
  • [DRILL-3652] - Need to document order of operations with window functions and flatten
  • [DRILL-3720] - Avro Record Reader should process Avro files by per block basis
  • [DRILL-3888] - Build test jars for all Drill Modules

New Feature

  • [DRILL-2743] - Parquet file metadata caching
  • [DRILL-3180] - Apache Drill JDBC storage plugin to query rdbms systems such as MySQL and Netezza from Apache Drill
  • [DRILL-3470] - Add support for multiple partition by clauses for window functions in the same query
  • [DRILL-3492] - Add support for encoding of Drill data types into byte ordered format
  • [DRILL-3725] - Add HTTPS support for Drill web interface

Task

  • [DRILL-2693] - doc programmatically submit queries to Drill
  • [DRILL-3683] - [Unit Test] Add expected plan for tests in TestWindowFunctions suite
  • [DRILL-3799] - Create Simple HBase Tutorial

Important Unresolved Issues

  • The Drill error message about JSON syntax that appears when Drill cannot find the JDBC driver during configuration of the JDBC storage plugin is misleading. To configure the JDBC plugin, you must put the JDBC driver in the <drill_installation_directory>/jars/3rdparty directory. DRILL-3985
  • The MySQL TEXT type is not supported. The popular classicmodels database uses the TEXT type, and therefore cannot be used as is until this issue is resolved. DRILL-3956