Apache Drill 1.17.0 Release Notes

Release date: December 26, 2019

Today, we're happy to announce the availability of Drill 1.17.0. You can download it here.

New Features and Improvements

This release of Drill provides the following new features and improvements:

  • DRILL-6540 - Upgrade to HADOOP-3.0 libraries. The hadoop-winutils version that worked for previous releases does not work with Drill 1.17 and later. Use the hadoop-winutils version provided with Drill 1.17 or use custom hadoop-winutils built for Hadoop 3.2.0.
  • DRILL-6739 - Update Kafka libs to 2.0.0+ version
  • DRILL-7401 - Upgrade to Sqlline 1.9
  • DRILL-7200 - Update Calcite to 1.19.0 / 1.20.0
  • DRILL-5674 - Support for .zip compression
  • DRILL-6835 - Schema provision using File / Table Function
  • DRILL-7337 - Support for vararg UDFs
  • DRILL-7096 - Develop vector for canonical Map
  • DRILL-7343 - User-Agent UDFs added

Hive complex types support:

New format plugins support:

  • DRILL-4303 - ESRI Shapefile (shp) format plugin
  • DRILL-7177 - Format Plugin for Excel Files
  • DRILL-6096 - Provide mechanisms to specify field delimiters and quoted text for TextRecordWriter
  • Parquet format improvements, including runtime row group pruning (DRILL-7062), empty parquet creation (DRILL-7156), reading (DRILL-4517) support, and more.

Metastore support:

  • DRILL-7272 - Implement Drill Iceberg Metastore plugin
  • DRILL-7273 - Create operator for handling metadata
  • DRILL-7357 - Expose Drill Metastore data through INFORMATION_SCHEMA

The following sections list additional fixes and improvements in Drill 1.17.0:

Sub-task

  • [DRILL-5491] - NPE when reading a CSV file, with headers, but blank header line
  • [DRILL-6965] - Adjust table function usage for all storage plugins and implement schema parameter
  • [DRILL-7168] - Implement ALTER SCHEMA ADD / REMOVE COLUMN / PROPERTY commands
  • [DRILL-7221] - Exclude debug files generated by maven debug option from jar
  • [DRILL-7240] - Temp fix: Run-time rowgroup pruning match() fails on casting a Long to an Integer
  • [DRILL-7244] - Run-time rowgroup pruning match() fails on casting a Long to an Integer
  • [DRILL-7271] - Refactor Metadata interfaces and classes to contain all needed information for the File based Metastore
  • [DRILL-7310] - Move schema-related classes from exec module to be able to use them in metastore module
  • [DRILL-7329] - Implement metadata usage for Parquet format plugin
  • [DRILL-7331] - Support Iceberg metadata expiration
  • [DRILL-7356] - Introduce session options for the Drill Metastore
  • [DRILL-7387] - Failed to get value by int key from map nested into struct

Bug

  • [DRILL-1834] - Misleading error message when querying an empty Parquet file
  • [DRILL-3587] - Select hive's struct data gives IndexOutOfBoundsException instead of unsupported error
  • [DRILL-3664] - CAST integer zero , one to boolean false , true
  • [DRILL-3676] - Group by ordinal number of an output column results in parse error
  • [DRILL-3850] - Execute multiple commands from sqlline -q
  • [DRILL-3995] - Scalar replacement bug with Common Subexpression Elimination
  • [DRILL-4547] - Javadoc fails with Java8
  • [DRILL-4788] - Exporting from Parquet to CSV - commas in strings are not escaped
  • [DRILL-4949] - Need better handling of empty parquet files
  • [DRILL-5183] - Drill doesn't seem to handle array values correctly in Parquet files
  • [DRILL-5436] - Need a way to input password which contains space when calling sqlline
  • [DRILL-5451] - Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long
  • [DRILL-5487] - Vector corruption in CSV with headers and truncated last row
  • [DRILL-5554] - Wrong error type for "SELECT a" from a CSV file without headers
  • [DRILL-5844] - Incorrect values of TABLE_TYPE returned from method DatabaseMetaData.getTables of JDBC API
  • [DRILL-5929] - Misleading error for text file with blank line delimiter
  • [DRILL-5983] - Unsupported nullable converted type INT_8 for primitive type INT32 error
  • [DRILL-6181] - CTAS should support writing nested structures (nested lists) to parquet.
  • [DRILL-6528] - Planner setting the wrong number of records to read (Parquet Reader)
  • [DRILL-6723] - Kafka reader fails on malformed JSON
  • [DRILL-6885] - CTAS for empty output doesn't create parquet file or folder
  • [DRILL-6904] - Update maven-javadoc-plugin to 3.0.1 version
  • [DRILL-6958] - CTAS csv with option
  • [DRILL-6984] - from table escape parameter not deleted when defined with value other than '"'
  • [DRILL-6990] - IllegalStateException: The current reader doesn't support getting next information
  • [DRILL-7050] - RexNode convert exception in subquery
  • [DRILL-7082] - Inconsistent results with implicit partition columns, multi scans
  • [DRILL-7083] - Wrong data type for explicit partition column beyond file depth
  • [DRILL-7084] - ResultSet getObject method throws not implemented exception if the column type is NULL
  • [DRILL-7105] - Error while building the Drill native client
  • [DRILL-7132] - Metadata cache does not have correct min/max values for varchar and interval data types
  • [DRILL-7139] - Date_add() can produce incorrect results when adding to a timestamp
  • [DRILL-7148] - TPCH query 17 increases execution time with Statistics enabled because join order is changed
  • [DRILL-7158] - null values for varchar, interval, boolean are displayed as empty string in SqlLine
  • [DRILL-7164] - KafkaFilterPushdownTest is sometimes failing to pattern match correctly.
  • [DRILL-7167] - DESCRIBE TABLE statement is not implemented
  • [DRILL-7170] - IllegalStateException: Record count not set for this vector container
  • [DRILL-7171] - Count(*) query on leaf level directory is not reading summary cache file.
  • [DRILL-7181] - [Text V3 Reader] Exception with inadequate message is thrown if select columns as array with extractHeader set to true
  • [DRILL-7187] - Improve selectivity estimates for range predicates when using histogram
  • [DRILL-7195] - Query returns incorrect result or does not fail when cast with is null is used in filter condition
  • [DRILL-7196] - Queries are still runnable on disabled plugins
  • [DRILL-7198] - Issuing a control-C in Sqlline exits the session (it does cancel the query)
  • [DRILL-7199] - Optimize the time taken to populate column statistics for non-interesting columns
  • [DRILL-7204] - Add proper validation when creating plugin
  • [DRILL-7205] - Drill fails to start when authentication is disabled
  • [DRILL-7208] - Drill commit is not shown if build Drill from the 1.16.0-rc1 release sources
  • [DRILL-7225] - Merging of columnTypeInfo for file with different schema throws NullPointerException during refresh metadata
  • [DRILL-7227] - TPCDS queries 47, 57, 59 fail to run with Statistics enabled at sf100
  • [DRILL-7228] - Histogram end points show high deviation for a sample data set
  • [DRILL-7237] - IllegalStateException in aggregation function 'single_value' when there is a varchar datatype in the subquery results
  • [DRILL-7238] - Drill does not use DirectScan for non-existent columns
  • [DRILL-7242] - Query with range predicate hits IOBE when accessing histogram buckets
  • [DRILL-7245] - TPCDS queries 1, 45, 65, 97 are 3x slower when Statistics is enabled at sf 100
  • [DRILL-7250] - Query with CTE fails when its name matches to the table name without access
  • [DRILL-7257] - [Text V3 Reader] dir0 is empty if a column filter returns all lines.
  • [DRILL-7258] - [Text V3 Reader] Unsupported operation error is thrown when select a column with a long string
  • [DRILL-7262] - Parse Error appears on attempting to run several SQL queries at the same time in SQLLine
  • [DRILL-7276] - xss(bug) in apache drill Web UI latest verion 1.16.0 when authenticated
  • [DRILL-7290] - “Failed to construct kafka consumer” using Apache Drill
  • [DRILL-7294] - Prevent generating java beans using protostuff to avoid overriding classes with the same simple name declared as nested in the proto files
  • [DRILL-7297] - Query hangs in planning stage when Error is thrown
  • [DRILL-7306] - Disable "fast schema" batch for new scan framework
  • [DRILL-7307] - casthigh for decimal type can lead to the issues with VarDecimalHolder
  • [DRILL-7321] - split function doesn't work without from
  • [DRILL-7324] - Many vector-validity errors from unit tests
  • [DRILL-7327] - Log Regex Plugin Won't Recognize Schema
  • [DRILL-7332] - Drill requires parentheses in the empty file for 'LOAD' argument in the 'CREATE SCHEMA' command
  • [DRILL-7333] - Batch of container count fixes
  • [DRILL-7335] - Error when reading csv file with headers only
  • [DRILL-7338] - REST API calls to Drill fail due to insufficient heap memory
  • [DRILL-7341] - Vector reAlloc may fails after exchange.
  • [DRILL-7345] - Strange Behavior for UDFs with ComplexWriter Output
  • [DRILL-7351] - WebUI is Vulnerable to CSRF
  • [DRILL-7353] - Wrong driver class is written to the java.sql.Driver
  • [DRILL-7358] - Text reader returns nothing for count queries over empty files
  • [DRILL-7362] - COUNT(*) on JSON with outer list results in JsonParse error
  • [DRILL-7367] - Remove Server details from response headers
  • [DRILL-7368] - Query from Iceberg Metastore fails if filter column contains null
  • [DRILL-7369] - Schema for MaprDB tables is not used for the case when several fields are queried
  • [DRILL-7372] - MethodAnalyzer consumes too much memory
  • [DRILL-7373] - Fix problems involving reading from DICT type
  • [DRILL-7376] - Drill ignores Hive schema for MaprDB tables when group scan has star column
  • [DRILL-7377] - Can't Create Nested List using EVF with Late Schema
  • [DRILL-7388] - Apache Drill Kafka Storage module fails to return results for partitions containing single offset record
  • [DRILL-7391] - Wrong result when doing left outer join on CSV table
  • [DRILL-7394] - JSON in Documentation Contains Typo
  • [DRILL-7407] - drill hive struct not support
  • [DRILL-7413] - Scan operator does not set the container record count
  • [DRILL-7414] - EVF incorrectly sets buffer writer index after rollover
  • [DRILL-7424] - Project operator fails to set the container row count
  • [DRILL-7436] - Fix record count, vector structure issues in several operators
  • [DRILL-7439] - Batch count fixes for six additional operators
  • [DRILL-7440] - Failure during loading of RepeatedCount functions
  • [DRILL-7441] - Fix issues with fillEmpties, offset vectors
  • [DRILL-7446] - Eclipse compilation issue in AbstractParquetGroupScan
  • [DRILL-7448] - Fix warnings when running Drill memory tests
  • [DRILL-7450] - Improve performance for ANALYZE command
  • [DRILL-7456] - Batch count fixes for 12 additional operators
  • [DRILL-7462] - Fix Links and Typos in Documentation
  • [DRILL-7463] - Apache license is not added to the generated classes
  • [DRILL-7468] - Metastore unit tests may fail when used sources from the release archive
  • [DRILL-7469] - Disable doclint for maven-javadoc-plugin
  • [DRILL-7470] - drill-yarn unit tests print stack traces with NoSuchMethodError
  • [DRILL-7471] - describe table command fails with ClassCastException when metastore is enabled
  • [DRILL-7472] - Fix ser / de for sys and information_schema tables
  • [DRILL-7473] - Parquet reader failed to get field of repeated map
  • [DRILL-7476] - Info in some sys schema tables are missing if queried with limit clause
  • [DRILL-7482] - Fix missing artifact and overlapping classes warnings in Drill build
  • [DRILL-7484] - Malware found with some antiviruses in the Drill test resources folder
  • [DRILL-7485] - NPE on PCAP Batch Reader
  • [DRILL-7490] - limit is not pushed to JDBC storage plugin
  • [DRILL-7494] - Unable to connect to Drill using JDBC driver when using custom authenticator

New Feature

  • [DRILL-7326] - Support repeated lists for CTAS parquet format
  • [DRILL-7374] - Support for IPV6 address

Improvement

  • [DRILL-1709] - desc => describe command
  • [DRILL-1999] - Drill should expose the Parquet logical schema rather than the physical schema
  • [DRILL-4782] - TO_TIME function cannot separate time from date time string
  • [DRILL-6050] - Provide a limit to number of rows fetched for a query in UI
  • [DRILL-6332] - DrillbitStartupException: Failure while initializing values in Drillbit
  • [DRILL-6842] - Export to CSV using CREATE TABLE AS (CTAS) wrong parsed
  • [DRILL-6951] - Merge row set based mock data source
  • [DRILL-6961] - Error Occurred: Cannot connect to the db. query INFORMATION_SCHEMA.VIEWS : Maybe you have incorrect connection params or db unavailable now (timeout)
  • [DRILL-6974] - SET option command
  • [DRILL-6988] - Utility of the too long error message when syntax error
  • [DRILL-7028] - Reduce the planning time of queries on large Parquet tables with large metadata cache files
  • [DRILL-7030] - Make format plugins fully pluggable
  • [DRILL-7174] - Expose complex to Json control in the Drill C++ Client
  • [DRILL-7206] - Tuning hash join code using primitive int
  • [DRILL-7222] - Visualize estimated and actual row counts for a query
  • [DRILL-7261] - Simplify Easy format config for new scan framework
  • [DRILL-7278] - Refactor result set loader projection mechanism
  • [DRILL-7279] - Support provided schema for CSV without headers
  • [DRILL-7292] - Remove V1, V2 text readers
  • [DRILL-7293] - Convert the regex ("log") plugin to use EVF
  • [DRILL-7302] - Bump Apache Avro from 1.8.2 to 1.9.0
  • [DRILL-7313] - Use Hive schema for MaprDB native reader when field was empty and support all text mode
  • [DRILL-7334] - Update Iceberg Metastore Parquet write mode
  • [DRILL-7385] - Convert PCAP Format Plugin to EVF
  • [DRILL-7402] - Suppress batch dumps for expected failures in tests
  • [DRILL-7403] - Validate batch checks, vector integretity in unit tests
  • [DRILL-7412] - Minor unit test improvements
  • [DRILL-7417] - Add user logged in/out event in info level logs
  • [DRILL-7418] - MetadataDirectGroupScan improvements
  • [DRILL-7443] - Enable PCAP Plugin to Reassemble TCP Streams
  • [DRILL-7445] - Create batch copier based on result set framework
  • [DRILL-7486] - Restructure row set reader builder

Task

  • [DRILL-5506] - Apache Drill Querying data from compressed .zip file
  • [DRILL-6642] - Update protocol-buffers version
  • [DRILL-6711] - Use jitpack repository for Drill Calcite project artifacts instead of repository.mapr.com
  • [DRILL-7236] - SqlLine 1.8 upgrade
  • [DRILL-7267] - Add Slack Link in Documentation
  • [DRILL-7314] - Use TupleMetadata instead of concrete implementation
  • [DRILL-7315] - Revise precision and scale order in the method arguments
  • [DRILL-7316] - Move classes from org.apache.drill.metastore into org.apache.drill.exec.metastore package in java-exec module
  • [DRILL-7317] - Close ClassLoaders used for udf jars uploading when closing FunctionImplementationRegistry
  • [DRILL-7339] - Upgrade to Iceberg latest commits to fix issue with orphan files after delete in transaction
  • [DRILL-7347] - Upgrade Apache Iceberg to released version
  • [DRILL-7350] - Move RowSet related classes from test folder
  • [DRILL-7360] - Refactor WatchService code in Drillbit class
  • [DRILL-7393] - Revisit Drill tests to ensure that patching is executed before any test run
  • [DRILL-7397] - Fix logback errors when building the project
  • [DRILL-7405] - Build fails due to inaccessible apache-drill on S3 storage
  • [DRILL-7409] - Remove bigIntDictionary.parquet from project sources
  • [DRILL-7453] - Update joda-time to 2.10.5 to have correct time zone info
  • [DRILL-7464] - Apache Drill 1.17 Release Activities
  • [DRILL-7474] - Reduce size of Drill's tar.gz file
  • [DRILL-7479] - Short-term fixes for metadata API parameterized type issues
  • [DRILL-7481] - Fix raw type warnings in Iceberg Metastore and related classes
  • [DRILL-7483] - Add support for 12 and 13 java versions