Apache Drill 1.1.0 Release Notes

Release date: July 5, 2015

It has been about 6 weeks since the release of Drill 1.0.0. Today we’re happy to announce the availability of Drill 1.1.0, providing 119 additional enhancements and bug fixes.

Noteworthy New Features in Drill 1.1.0

Drill now supports window functions, automatic partitioning, and Hive impersonation.

Ranking Window Functions

  • ROW_NUMBER
  • RANK
  • DENSE_RANK
  • PERCENT_RANK
  • CUME _DIST

Aggregate Window Functions

  • COUNT
  • SUM
  • MIN
  • MAX
  • AVG

Automatic Partitioning in CTAS (DRILL-3333)

When a table is created with a partition by clause, the parquet writer will create separate files for the different partition values. The data will first be sorted by the partition keys, and the parquet writer will create a new file when it encounters a new value for the partition columns.

When queries are issued against data that was created this way, partition pruning will work if the filter contains a partition column. Unlike directory-based partitioning, no view is required, nor is it necessary to reference the dir* column names.

Hive impersonation support (DRILL-3203)

When impersonation is enabled, Drill now supports impersonating the user who issued the query when accessing Hive metadata/data (instead of accessing Hive as the user that started the drillbit).

Enhancements and Bug Fixes

Sub-task

  • [DRILL-3203] - Add support for impersonation in Hive storage plugin
  • [DRILL-3277] - SUM(CAST(columns[0] AS INT)) OVER(...) gives wrong results
  • [DRILL-3278] - SUM(CAST(columns[0] AS BIGINT)) OVER(...) gives wrong results
  • [DRILL-3281] - window functions that involve TIME columns generate wrong results

Bug

  • [DRILL-669] - Information Schema should be schema sensitive
  • [DRILL-1315] - Allow specifying Zookeeper root znode and cluster-id as JDBC parameters
  • [DRILL-1673] - Flatten function can not work well with nested arrays.
  • [DRILL-1820] - Fix broken SCM and project links caused by graduation of Drill to a Apache TLP
  • [DRILL-2023] - Hive function
  • [DRILL-2137] - ResultsSetMetaData.getColumnName() returns "none" (rather than right class name)
  • [DRILL-2310] - Drill fails to start in embedded mode on windows
  • [DRILL-2346] - Star is not expanded correctly in create view if view fields are specified
  • [DRILL-2403] - TimePrintMillis.toString() misses leading zeros in post-decimal-point part
  • [DRILL-2416] - Zookeeper in sqlline connection string does not override the entry from drill-override.conf
  • [DRILL-2447] - Calling getObject on a closed ResultSet object should throw a SQLException
  • [DRILL-2449] - JDBC : DatabaseMetaData.getProcedures should return an empty result set
  • [DRILL-2450] - JDBC : DatabaseMetaData.getColumns is missing the 'COLUMN_SIZE' in the result set
  • [DRILL-2462] - JDBC : ResultSetMetaData.isNullable returns true even when the column is a required one
  • [DRILL-2480] - [umbrella] Identify, fix INFORMATION_SCHEMA and JDBC metadata bugs
  • [DRILL-2494] - Binding parameters to a PreparedStatement throws a SQLException
  • [DRILL-2531] - getColumns() not right/implemented for INTERVAL types
  • [DRILL-2555] - JDBC driver throws RuntimeExceptions rather than SQLExceptions
  • [DRILL-2592] - Jdbc-all driver includes transitive dependencies
  • [DRILL-2595] - Sqlline Usage needs to be corrected
  • [DRILL-2622] - C++ Client valgrind errors in sync API
  • [DRILL-2628] - sqlline hangs and then asserts when trying to execute anything on a dead JDBC connection
  • [DRILL-2763] - [umbrella] Implement INFORMATION_SCHEMA.COLUMNS enough for relevant tools
  • [DRILL-2782] - Decide, implement behavior for transaction-related JDBC methods
  • [DRILL-2866] - Incorrect error message reporting schema change when streaming aggregation and hash join are disabled
  • [DRILL-2903] - Update TestDrillbitResilience tests so that they correctly manage canceled queries that get to complete too quickly.
  • [DRILL-2923] - Ensure all unit tests pass without assertions enabled
  • [DRILL-2967] - Incompatible types error reported in a "not in" query with compatible data types
  • [DRILL-2985] - REGRESSION : NPE seen for project distinct values from CSV
  • [DRILL-2988] - Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified
  • [DRILL-3004] - Failure in planning join when disabling hash join and exchanges
  • [DRILL-3019] - Extra column in Schema of Recordbatch from scanning Values
  • [DRILL-3028] - Exception in correlated subquery with exists when columns in subquery are not qualified
  • [DRILL-3032] - Join between complex (nested repeated lists) data results in "LATE type is not supported"
  • [DRILL-3034] - Apply UserException to port-binding error; handle UserException in embedded-Drill case
  • [DRILL-3035] - Create ControlsInjector interface to enforce implementing methods
  • [DRILL-3078] - Tracking bug for ODBC doc changes from Simba
  • [DRILL-3094] - TPCH query 15 returns non-deterministic result
  • [DRILL-3120] - Windows startup throws NPE
  • [DRILL-3125] - Drill UI Profile page fails to load for a query in some scenarios
  • [DRILL-3134] - Doc: "Supported ... Types" section doesn't include complex types
  • [DRILL-3143] - MaterializedField#clone should deep copy itself without disregarding its children
  • [DRILL-3144] - Doc.: JDBC Driver section is SQuirrel-specific, should be moved
  • [DRILL-3147] - tpcds-sf1-parquet query 73 causes memory leak
  • [DRILL-3155] - Composite vectors leak memory
  • [DRILL-3159] - Make JDBC throttling threshold configurable
  • [DRILL-3172] - Can not plan exception when over clause is empty
  • [DRILL-3173] - Invalid inputs are NOT handled properly by Window functions
  • [DRILL-3177] - Upgrade Mongo java driver to 3.0.1
  • [DRILL-3179] - Example output doesn't match example data (ticket_sales.json) in Complex JSON doc
  • [DRILL-3182] - Window function with DISTINCT qualifier returns seemingly incorrect result
  • [DRILL-3183] - Query that uses window functions returns NPE
  • [DRILL-3188] - Restrict the types of window frames that can be specified
  • [DRILL-3190] - Invalid FragmentState transition from CANCELLATION_REQUESTED in QueryManager
  • [DRILL-3195] - Throw unsupported exception for Window functions that are not currently supported
  • [DRILL-3196] - Disable multiple partition by clauses in the same sql query
  • [DRILL-3197] - Query that uses window functions fails
  • [DRILL-3198] - JDBC driver returns null from DatabaseMetaData.getTypeInfo(...)
  • [DRILL-3199] - GenericAccessor doesn't support isNull
  • [DRILL-3204] - Problem in name resolution with window functions
  • [DRILL-3206] - Memory leak in window functions
  • [DRILL-3208] - Hive : Tpch (SF 0.01) query 10 fails with a system error when the data is backed by hive tables
  • [DRILL-3210] - Star is not expanded correctly in projection list when used with window function
  • [DRILL-3214] - Config option to cast empty string to null does not cast empty string to null
  • [DRILL-3215] - Describe table from hive storage does not connect to "default" database
  • [DRILL-3216] - Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns
  • [DRILL-3218] - Window function usage throws CompileException
  • [DRILL-3220] - IOB Exception when using constants in window functions
  • [DRILL-3245] - Error message needs to be fixed.
  • [DRILL-3254] - Average over window functions returns wrong results
  • [DRILL-3255] - Queries must fail when invalid-positions are specified in order by clause of a window function
  • [DRILL-3260] - Conflicting servlet-api jar causing web UI to be slow
  • [DRILL-3262] - Rename missed .impl.DrillDatabaseMetaData to .impl.DrillDatabaseMetaDataImpl to
  • [DRILL-3263] - Read smallint and tinyint data from hive as integer until these types are well supported throughout Drill
  • [DRILL-3265] - Query with window function and sort below that spills to disk runs out of memory
  • [DRILL-3266] - Drill's hive storage plugin cannot find RegexSerDe
  • [DRILL-3268] - queries with empty OVER() clause return empty result set
  • [DRILL-3269] - Window function query takes too long to complete and return results
  • [DRILL-3273] - Hive function 'concat_ws' not working from drill
  • [DRILL-3275] - Difference in expected results - query that uses window functions
  • [DRILL-3285] - Split DrillCursor.next(), clean up DrillCursor for clarity
  • [DRILL-3293] - CTAS with window function fails with UnsupportedOperationException
  • [DRILL-3294] - False schema change exception in CTAS with AVG window function
  • [DRILL-3296] - Group By Union Distinct fails at planning
  • [DRILL-3298] - Wrong result with SUM window function and order by without partition by in the OVER clause
  • [DRILL-3305] - DrillOptiq should raise appropriate error message while dealing with unknown RexNode
  • [DRILL-3306] - Concurrently Running hive queries results in a deadlock situation
  • [DRILL-3307] - Query with window function runs out of memory
  • [DRILL-3311] - sqlline hangs when query is cancelled while results are returned from the server
  • [DRILL-3316] - Different SQLHandlers should go through the same planning logics for the same SELECT statement.
  • [DRILL-3318] - SUM(CAST(col as INT)) shows different results when used in window functions
  • [DRILL-3321] - ZK PStore configuration needed to prevent Drill Web UI problems
  • [DRILL-3324] - CTAS broken with the new auto partition feature ( Not in master yet)
  • [DRILL-3326] - Query with unsupported windows function containing "AS" blocks correct error message
  • [DRILL-3327] - row_number function returns incorrect result when only order by clause is specified
  • [DRILL-3328] - Cannot cast hive binary to varchar
  • [DRILL-3333] - Add support for auto-partitioning in parquet writer
  • [DRILL-3337] - Queries with Window Function DENSE_RANK fail with SchemaChangeException
  • [DRILL-3343] - Seemingly incorrect result with SUM window functions and float data type
  • [DRILL-3344] - When Group By clause is present, the argument in window function should not refer to any column outside Group By
  • [DRILL-3345] - TestWindowFrame fails to properly check cases involving multiple batches
  • [DRILL-3346] - Windowing query over View should display a better error message
  • [DRILL-3357] - Error when adding 2 columns together
  • [DRILL-3359] - Drill should throw and error when window function defined using WINDOW AS uses ROWS UNBOUNDED PRECEDING
  • [DRILL-3361] - CTAS Auto Partitioning: Fails when we use boolean as the partition type
  • [DRILL-3370] - FLATTEN error with a where clause
  • [DRILL-3373] - CTAS partition by with empty list of partitioning columns should be blocked in parser
  • [DRILL-3374] - CTAS with PARTITION BY, partition column name from view can not be resolved
  • [DRILL-3376] - Reading individual files created by CTAS with partition causes an exception
  • [DRILL-3377] - Can't partition by expression when columns are explicitly specified in the CTAS column list
  • [DRILL-3378] - Average over window on a view returns wrong results
  • [DRILL-3380] - CTAS Auto Partitioning : We are not pruning when we use functions in the select list
  • [DRILL-3398] - WebServer is leaking memory for queries submitted through REST API or WebUI
  • [DRILL-3400] - After shifting CTAS's data, query on CTAS table failed
  • [DRILL-3404] - Filter on window function does not appear in query plan
  • [DRILL-3408] - CTAS partition by columns[i] from csv fails
  • [DRILL-3410] - Partition Pruning : We are doing a prune when we shouldn't
  • [DRILL-3411] - CTAS Partition by column in deeper layer fails
  • [DRILL-3413] - Use DIGEST mechanism in creating Hive MetaStoreClient for proxy users when SASL authentication is enabled
  • [DRILL-3418] - Partition Pruning : We are over-pruning and this leads to wrong results
  • [DRILL-3422] - Multiple unit test failures on Windows with current master

Improvement

  • [DRILL-745] - Drill fails to read the schema of avro tables from hive when the schema is in a separate file
  • [DRILL-959] - drill fails to display binary in hive correctly
  • [DRILL-1862] - over clause with only order by clause throws an exception
  • [DRILL-2086] - mapr profile - use MapR 4.0.2
  • [DRILL-2272] - Tibco Spotfire Desktop configuration for Drill documentation
  • [DRILL-2405] - Generate test data for window function instead of downloading it from S3
  • [DRILL-2746] - Filter is not pushed into subquery past UNION ALL
  • [DRILL-2764] - REST API should return exception details on error
  • [DRILL-2839] - ODBC Driver Doc to point to latest available Driver, also provide compatibility matrix for Drill and ODBC version
  • [DRILL-2997] - Remove references to groupCount from SerializedField
  • [DRILL-3025] - Tibco Spotfire Server - JDBC - Configuration Document
  • [DRILL-3108] - Replace templated returns with covariant return overrides
  • [DRILL-3130] - Project can be pushed below union all / union to improve performance
  • [DRILL-3148] - JReport enablement document for Drill
  • [DRILL-3200] - Add Window functions: ROW_NUMBER, RANK, PERCENT_RANK, DENSE_RANK and CUME_DIST
  • [DRILL-3240] - Fetch hadoop maven profile specific Hive version in Hive storage plugin
  • [DRILL-3304] - improve org.apache.drill.exec.expr.TypeHelper error messages when UnsupportedOprationException is thrown
  • [DRILL-3319] - UserExceptions should be logged from the right class
  • [DRILL-3320] - Do away with "rebuffing" Drill jar
  • [DRILL-3421] - Add new outputformat=json

New Feature

  • [DRILL-1169] - Add support for UNION (distinct type)
  • [DRILL-3246] - Query planning support for partition by clause in Drill's CTAS statement

Task

  • [DRILL-2420] - [umbrella] Identify, fix DatabaseMetaData.getColumns() bugs
  • [DRILL-2952] - Hive 1.0 plugin for Drill
  • [DRILL-2983] - Bridget's User Auth Doc