Apache Drill 1.16.0 Release Notes

Release date: May 2, 2019

Today, we’re happy to announce the availability of Drill 1.16.0. You can download it here.

New Features and Improvements

This release of Drill provides the following new features and improvements:

The following sections provide a complete list of all the fixes and improvements in Drill 1.16.0:

Sub-task

  • [DRILL-6780] - Caching Dependencies
  • [DRILL-6846] - Store test results
  • [DRILL-6852] - Adapt current Parquet Metadata cache implementation to use Drill Metastore API
  • [DRILL-6964] - Implement CREATE / DROP TABLE SCHEMA commands
  • [DRILL-7058] - Refresh command to support subset of columns
  • [DRILL-7063] - Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)
  • [DRILL-7064] - Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)
  • [DRILL-7065] - Ensure backward compatibility is maintained
  • [DRILL-7066] - Auto-refresh should pick up existing columns from metadata cache
  • [DRILL-7068] - Support memory adjustment framework for resource management with Queues
  • [DRILL-7073] - CREATE SCHEMA command / TupleSchema / ColumnMetadata improvements
  • [DRILL-7089] - Implement caching of BaseMetadata classes
  • [DRILL-7092] - Rename map to struct in schema definition
  • [DRILL-7095] - Expose Tuple Metadata to the physical operator
  • [DRILL-7116] - Adapt statistics to use Drill Metastore API
  • [DRILL-7117] - Support creation of histograms for numeric data types (except Decimal) and date/time/timestamp
  • [DRILL-7119] - Modify selectivity calculations to use histograms for supported data types
  • [DRILL-7138] - Implement command to describe schema for table
  • [DRILL-7157] - Wrap SchemaParsingException into UserException when creating schema
  • [DRILL-7159] - After renaming MAP to STRUCT typeString method still outputs MAP name

Bug

  • [DRILL-407] - JSON max number value returns INFINITY in drill
  • [DRILL-808] - Sqlline use schema does not change the displayed schema
  • [DRILL-1243] - Drill's error message not clear when it fails reading json files containing arrays of hetrogeneous types
  • [DRILL-2326] - Scalar replacement fails in TestConvertFunctions
  • [DRILL-3090] - sqlline : save SQL to script file and replay from script, results in error
  • [DRILL-3846] - Metadata Caching : A count(*) query took more time with the cache in place
  • [DRILL-4211] - Column aliases not pushed down to JDBC stores in some cases when Drill expects aliased columns to be returned.
  • [DRILL-4312] - JDBC PlugIN - MySQL Causes errors in Drill INFORMATION_SCHEMA
  • [DRILL-4374] - Drill rewrites Postgres query with ambiguous column references
  • [DRILL-4395] - equi-inner join of two tables in Postgres returns null one of the projected columns
  • [DRILL-4400] - Cannot apply 'NOT' to arguments of type 'NOT<CHAR(2)>'.
  • [DRILL-4403] - AssertionError: Internal error: Conversion to relational algebra failed to preserve datatypes
  • [DRILL-4407] - Group by subquery causes Java NPE
  • [DRILL-4408] - re-written query projecting an aggregate on a boolean not supported by Postgres
  • [DRILL-4436] - Result data gets mixed up when various tables have a column "label"
  • [DRILL-4677] - Cast to TIMESTAMP within value constructor results in IllegalArgumentException
  • [DRILL-4814] - extractHeader attribute not working with the table function
  • [DRILL-4858] - REPEATED_COUNT on an array of maps and an array of arrays is not implemented
  • [DRILL-4902] - nested aggregate query does not complain about missing GROUP BY clause
  • [DRILL-4939] - to_date function returns incorrect result
  • [DRILL-4946] - org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode
  • [DRILL-5161] - Several TestConvertFunctions tests produce scalar replacement errors
  • [DRILL-5295] - Unable to query INFORMATION_SCHEMA.`TABLES` if MySql storage plugin enabled
  • [DRILL-5581] - Query with CASE statement returns wrong results
  • [DRILL-5683] - Incorrect query result when query uses NOT(IS NOT NULL) expression
  • [DRILL-5713] - Doing joins on tables that share column names in a JDBC store returns incorrect results
  • [DRILL-6013] - requesting INFORMATION_SCHEMA with a postgresql connector generates a null pointer exception
  • [DRILL-6066] - AssertionError: Internal error: Conversion to relational algebra failed to preserve datatypes: validated type
  • [DRILL-6260] - Query fails with "ERROR: Non-scalar sub-query used in an expression" when it contains a cast expression around a scalar sub-query
  • [DRILL-6369] - typeof() fails for constants
  • [DRILL-6377] - typeof() does not return DECIMAL scale, precision
  • [DRILL-6458] - NPE when error while applying rule ReduceExpressionsRule_Project
  • [DRILL-6524] - Two CASE statements in projection influence results of each other
  • [DRILL-6533] - (type|sqlType|drillType|mode)Of functions fail when used with constants
  • [DRILL-6707] - Query with 10-way merge join fails with IllegalArgumentException
  • [DRILL-6722] - Query from parquet with case-then and arithmetic operation returns a wrong result
  • [DRILL-6734] - Unable to find value vector of path `EXPR$0`, returning null instance.
  • [DRILL-6830] - Hook.REL_BUILDER_SIMPLIFY handler didn't removed cause performance degression
  • [DRILL-6849] - Runtime filter queries with nested broadcast returns wrong results
  • [DRILL-6855] - Query from non-existent proxy user fails with "No default schema selected" when impersonation is enabled
  • [DRILL-6856] - Wrong result returned if the query filters a boolean column with both "is true" and "is null" conditions
  • [DRILL-6869] - Drill allows to create views outside workspace
  • [DRILL-6871] - Enabling runtime filter eliminates more incoming rows than it should.
  • [DRILL-6875] - Drill doesn't try to update connection for S3 after session expired
  • [DRILL-6893] - Invalid output for star and self-join queries for RDBMS Storage Plugin
  • [DRILL-6894] - CTAS and CTTAS are not working on S3 storage when cache is disabled
  • [DRILL-6906] - File permissions are not being honored
  • [DRILL-6911] - Documentation issue - Hadoop core-site.xml is not supported by Drill to read S3 credentials
  • [DRILL-6914] - Query with RuntimeFilter and SemiJoin fails with IllegalStateException: Memory was leaked by query
  • [DRILL-6918] - Querying empty topics fails with "NumberFormatException"
  • [DRILL-6927] - Query fails when hive table with timestamp data is queried with enabled int96_as_timestamp and optimize_scan_with_native_reader options
  • [DRILL-6928] - Update description for exec.query.return_result_set_for_ddl option to reflect it affects JDBC connections only
  • [DRILL-6929] - maprfs-XXX-mapr.jar is present in jars/3rdparty folder for default profile
  • [DRILL-6931] - Drill "SHOW FILES" command duplicates empty S3 folders as subfolders
  • [DRILL-6933] - Ctrl+Enter meta combo not working on Windows & mac OS
  • [DRILL-6934] - Update the option documentation for planner.enable_unnest_lateral
  • [DRILL-6936] - Graceful shutdown test fails if loopback address is set in hosts
  • [DRILL-6937] - sys.functions table needs a fix in the names column
  • [DRILL-6941] - Incorrect EARLY_LIMIT0_OPT_KEY description
  • [DRILL-6944] - UnsupportedOperationException thrown for view over MapR-DB binary table
  • [DRILL-6947] - RuntimeFilter memory leak due to BF ByteBuf ownership transferring
  • [DRILL-6954] - Move commons libs used in UDFs module to the dependency management
  • [DRILL-6959] - Query with filter with cast to timestamp literal does not return any results
  • [DRILL-6960] - Auto Limit Wrapping should not apply to non-select query
  • [DRILL-6967] - TIMESTAMPDIFF returns incorrect value for SQL_TSI_QUARTER
  • [DRILL-6969] - Inconsistent results when reading MaprDB JSON tables using hive plugin when native reader is enabled
  • [DRILL-6970] - Issue with LogRegex format plugin where drillbuf was overflowing
  • [DRILL-6972] - Error: Could not find or load main class sqlline.SqlLine
  • [DRILL-6976] - SchemaChangeException happens when using split function in subquery if it returns empty result.
  • [DRILL-6978] - typeOf drillTypeOf sqlTypeOf not work with generated tables
  • [DRILL-6991] - Kerberos ticket is being dumped in the log if log level is "debug" for stdout
  • [DRILL-6997] - Semijoin is changing the join ordering for some tpcds queries.
  • [DRILL-6998] - Queries failing with "Failed to aggregate or route the RFW" due to "java.lang.ArrayIndexOutOfBoundsException" do not complete
  • [DRILL-6999] - Queries failed with "Failed to aggregate or route the RFW" due to "java.lang.ArrayIndexOutOfBoundsException"
  • [DRILL-7000] - Queries failing with "Failed to aggregate or route the RFW" do not complete
  • [DRILL-7002] - RuntimeFilter produce wrong results while setting exec.hashjoin.num_partitions=1
  • [DRILL-7008] - Drillbits: clear stale shutdown hook on close
  • [DRILL-7016] - Wrong query result with RuntimeFilter enabled when order of join and filter condition is swapped
  • [DRILL-7021] - HTTPD Throws NPE and Doesn't Recognize Timeformat
  • [DRILL-7022] - Partition pruning is not happening the first time after the metadata auto refresh
  • [DRILL-7034] - Window function over a malformed CSV file crashes the JVM
  • [DRILL-7035] - Drill C++ Client crashes on multiple SaslAuthenticatorImpl Destruction due to communication error
  • [DRILL-7041] - CompileException happens if a nested coalesce function returns null
  • [DRILL-7045] - UDF string_binary java.lang.IndexOutOfBoundsException:
  • [DRILL-7047] - Drill C++ Client crash due to Dangling stack ptr to sasl_callback_t
  • [DRILL-7049] - REST API returns the toString of byte arrays (VARBINARY types)
  • [DRILL-7054] - PCAP timestamp in milliseconds
  • [DRILL-7056] - Drill fails with NPE when starting in distributed mode and 31010 port is used
  • [DRILL-7061] - Selecting option to limit results to 1000 on web UI causes parse error
  • [DRILL-7072] - Query with semi join fails for JDBC storage plugin
  • [DRILL-7076] - NPE is logged when querying postgres tables
  • [DRILL-7079] - Drill can't query views from the S3 storage when plain authentication is enabled
  • [DRILL-7085] - Drill Statistics: analyze table cmd fails
  • [DRILL-7100] - parquet RecordBatchSizerManager : IllegalArgumentException: the requested size must be non-negative
  • [DRILL-7101] - IllegalArgumentException when reading parquet data
  • [DRILL-7103] - BSON Recored reader: wrong timestamp values
  • [DRILL-7107] - Unable to connect to Drill 1.15 through ZK
  • [DRILL-7108] - With statistics enabled TPCH 16 has two additional exchange operators
  • [DRILL-7109] - Statistics adds external sort, which spills to disk
  • [DRILL-7111] - Table function fails to query directories
  • [DRILL-7113] - Issue with filtering null values from MapRDB-JSON
  • [DRILL-7114] - ANALYZE command generates warnings for stats file and materialization
  • [DRILL-7118] - Filter not getting pushed down on MapR-DB tables.
  • [DRILL-7120] - Query fails with ChannelClosedException when Statistics is disabled
  • [DRILL-7121] - TPCH 4 takes longer when Statistics is disabled.
  • [DRILL-7122] - TPCDS queries 29 25 17 are slower when Statistics is disabled.
  • [DRILL-7123] - TPCDS query 83 runs slower when Statistics is disabled
  • [DRILL-7124] - Fix logger for ShowFilesHandler
  • [DRILL-7125] - REFRESH TABLE METADATA fails after upgrade from Drill 1.13.0 to Drill 1.15.0
  • [DRILL-7126] - Contrib format-ltsv is not being included in distribution
  • [DRILL-7130] - IllegalStateException: Read batch count [0] should be greater than zero
  • [DRILL-7140] - RM: Drillbits fail with "No enum constant QueueSelectionPolicy.SelectionPolicy.bestfit"
  • [DRILL-7142] - Add space after > in SqlLine prompt
  • [DRILL-7144] - sqlline option : !set useLineContinuation false, fails with ParseException
  • [DRILL-7145] - Exceptions happened during retrieving values from ValueVector are not being displayed at the Drill Web UI
  • [DRILL-7146] - Query failing with NPE when ZK queue is enabled
  • [DRILL-7147] - Source order of "drill-env.sh" and "distrib-env.sh" should be swapped
  • [DRILL-7150] - Fix timezone conversion for timestamp from maprdb after the transition from PDT to PST
  • [DRILL-7152] - Histogram creation throws exception for all nulls column
  • [DRILL-7153] - Drill Fails to Build using JDK 1.8.0_65
  • [DRILL-7154] - TPCH query 4, 17 and 18 take longer with sf 1000 when Statistics are disabled
  • [DRILL-7160] - exec.query.max_rows QUERY-level options are shown on Profiles tab
  • [DRILL-7166] - Count(*) queries with wildcards in table name are reading metadata cache and returning wrong results
  • [DRILL-7182] - TPCDS query 64 degrades due to Statistics even when disabled
  • [DRILL-7183] - TPCDS query 10, 35, 69 take longer with sf 1000 when Statistics are disabled
  • [DRILL-7185] - Drill Fails to Read Large Packets
  • [DRILL-7186] - Missing storage.json REST endpoint
  • [DRILL-7190] - Missing backward compatibility for REST API with DRILL-6562
  • [DRILL-7201] - Strange symbols in error window (Windows)
  • [DRILL-7202] - Failed query shows warning that fragments has made no progress
  • [DRILL-7213] - drill-format-mapr.jar contains stale git.properties file

New Feature

  • [DRILL-540] - Allow querying hive views in Drill
  • [DRILL-6992] - Support column histogram statistics
  • [DRILL-7014] - Format plugin for LTSV files
  • [DRILL-7048] - Implement JDBC Statement.setMaxRows() with System Option
  • [DRILL-7077] - Add Function to Facilitate Time Series Analysis

Improvement

  • [DRILL-416] - Make Drill work with SELECT without FROM
  • [DRILL-912] - Project push down tests rely on JSON pushdown but JSON reader no longer supports pushdown.
  • [DRILL-1328] - Support table statistics
  • [DRILL-1506] - Current Schema Not Shown In the sqlline Prompt
  • [DRILL-1669] - Custom sqlline Prompt
  • [DRILL-5509] - Upgrade Drill protobuf support from 2.5.0 to latest 3.3
  • [DRILL-5603] - Replace String file paths to Hadoop Path
  • [DRILL-5696] - Change default compiler strategy
  • [DRILL-5773] - Project pushdown into a subquery with select *
  • [DRILL-6050] - Provide a limit to number of rows fetched for a query in UI
  • [DRILL-6430] - Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or Locally
  • [DRILL-6562] - Plugin Management improvements
  • [DRILL-6582] - SYSLOG (RFC-5424) Format Plugin
  • [DRILL-6879] - Indicate a warning in the WebUI when a query makes little to no progress for a while
  • [DRILL-6880] - Hash-Join: Many null keys on the build side form a long linked chain in the Hash Table
  • [DRILL-6901] - Move SchemaBuilder from test to main for use outside tests
  • [DRILL-6903] - SchemaBuilder improvements
  • [DRILL-6910] - A filtering column remains in scan when filter pruning happens
  • [DRILL-6921] - Add button to reset Options filter
  • [DRILL-6939] - Indicate when a query is submitted and is in progress
  • [DRILL-6942] - Provide ability to sort list of profiles on Drill web UI
  • [DRILL-6950] - Row set-based scan framework
  • [DRILL-6952] - Merge row set based "compliant" text reader
  • [DRILL-6962] - Function coalesce returns an Error when none of the columns in coalesce exist in a parquet file
  • [DRILL-6971] - Display query state in query result page
  • [DRILL-6977] - Improve Hive tests configuration
  • [DRILL-6985] - Fix sqlline.bat issues on Windows and add drill-embedded.bat
  • [DRILL-7006] - Support type conversion shims in RowSetWriter
  • [DRILL-7007] - Revise row-set based tests to use simplified verify method
  • [DRILL-7011] - Allow hybrid model in the Row set-based scan framework
  • [DRILL-7018] - Parquet File fails with IndexOutOfBoundsException
  • [DRILL-7024] - Refactor ColumnWriter to simplify type-conversion shim
  • [DRILL-7028] - Reduce the planning time of queries on large Parquet tables with large metadata cache files
  • [DRILL-7031] - Add Travis job that runs protobuf generation command and checks if all protobufs are up-to-date
  • [DRILL-7032] - Ignore corrupt rows in a PCAP file
  • [DRILL-7036] - Improve UI for alert and error messages
  • [DRILL-7038] - Queries on partitioned columns scan the entire datasets
  • [DRILL-7042] - Apache Drill v1.15.0 failed to generate deb/rpm package
  • [DRILL-7051] - Upgrade to Jetty 9.3
  • [DRILL-7052] - Relative path for URL redirection
  • [DRILL-7060] - Support JsonParser Feature 'ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER' in JsonReader
  • [DRILL-7069] - Poor performance of transformBinaryInMetadataCache
  • [DRILL-7070] - Fix deb and rpm issues on apache drill master branch
  • [DRILL-7074] - Fixes and improvements to the scan framework for CSV
  • [DRILL-7075] - Fix debian package issue with control files
  • [DRILL-7081] - Upgrade GlassFish Jersey and Javax Servlet dependecies
  • [DRILL-7086] - Enhance row-set scan framework to use external schema
  • [DRILL-7110] - Skip writing profile when an ALTER SESSION is executed
  • [DRILL-7115] - Improve Hive schema show tables performance
  • [DRILL-7143] - Enforce column-level constraints when using a schema
  • [DRILL-7165] - Redundant Checksum calculating for ASC files

Task

  • [DRILL-5658] - Documentation for Drill Crypto Functions
  • [DRILL-6862] - Update Calcite to 1.18.0
  • [DRILL-6907] - Fix hive-exec-shaded classes recognition in IntelliJ IDEA
  • [DRILL-6946] - Implement java.sql.Connection setSchema and getSchema methods in DrillConnectionImpl
  • [DRILL-6955] - storage-jdbc unit tests improvements
  • [DRILL-6989] - Upgrade to SqlLine 1.7.0
  • [DRILL-7019] - Add check for redundant imports
  • [DRILL-7106] - Fix Intellij warning for FieldSchemaNegotiator
  • [DRILL-7155] - Create a standard logging message for batch sizes generated by individual operators
  • [DRILL-7188] - Revert DRILL-6642: Update protocol-buffers version
  • [DRILL-7189] - Revert DRILL-7105 Error while building the Drill native client
  • [DRILL-7207] - Update the copyright year in NOTICE.txt file
  • [DRILL-7212] - Add gpg key with apache.org email for sorabh