Apache Drill 1.0.0 Release Notes

Release date: May 19, 2015

Today we’re happy to announce the availability of Drill 1.0.0, providing additional enhancements and bug fixes. This release includes the following new features, enhancements, unresolved issues, and bug fixes:

Sub-task

  • [DRILL-2150] - Create an abstraction for repeated value vectors.
  • [DRILL-2358] - Ensure DrillScanRel differentiates skip-all, scan-all & scan-some in a backward compatible fashion
  • [DRILL-2893] - ScanBatch throws a NullPointerException instead of returning OUT_OF_MEMORY
  • [DRILL-2895] - AbstractRecordBatch.buildSchema() should properly handle OUT_OF_MEMORY outcome
  • [DRILL-2902] - Add support for context UDFs: user (and its synonyms session_user, system_user) and current_schema
  • [DRILL-2905] - RootExec implementations should properly handle IterOutcome.OUT_OF_MEMORY
  • [DRILL-2920] - properly handle OutOfMemoryException
  • [DRILL-2947] - AllocationHelper.allocateNew() doesn't have a consistent behavior when it can't allocate

Unresolved Issues

  • [DRILL-1868] Aliases are not allowed in WHERE, HAVING and GROUP BY clauses. Drill should return an error when such aliases are encountered, but instead Drill returns an incorrect result.
  • [DRILL-2015] Casting a numeric value that does not fit the data type of the value and causes overflow returns an incorrect result.
  • [DRILL-2355] Drill output from the TRUNC function in some cases includes an extra .0 in the result. Drill binds TRUNC functions having two input parameters to the function holder. The output type of TRUNC functions is FLOAT8 when the input is FLOAT8, resulting in the extra .0.

Bug Fixes

  • [DRILL-148] - Remove sandbox directory from source control, it is no longer utilized
  • [DRILL-625] - Server does not release resources even after client connection is closed
  • [DRILL-708] - TRUNC(n1) function returns a decimal instead of int
  • [DRILL-994] - Reduce hbase timeout when it is not reachable
  • [DRILL-1245] - Drill should pinpoint to the "Problem Record" when it fails to parse a json file
  • [DRILL-1440] - Allow delimited files to have customizable quote characters
  • [DRILL-1460] - JsonReader fails reading files with decimal numbers and integers in the same field
  • [DRILL-1502] - Can't connect to mongo when requiring auth
  • [DRILL-1503] - CTAS does not work against mongo plugin
  • [DRILL-1542] - Early fragment termination causes non running intermediate fragments to error
  • [DRILL-1545] - Json files can only be read when they have a .json extension
  • [DRILL-1727] - REPEATED_CONTAINS sometimes doesn't work
  • [DRILL-1827] - Unit test framework reports expected and actual values backwards in unordered comparison
  • [DRILL-1832] - Select * from json file failed with java.lang.IllegalArgumentException: null
  • [DRILL-1866] - Tests that include limit sporadically fail when run as part of entire test suite on Linux
  • [DRILL-1891] - Error message does not get propagated correctly when reading from JSON file
  • [DRILL-1973] - Tableau query causes parsing error
  • [DRILL-1980] - Create table with a Cast to interval day results in a file which cannot be read
  • [DRILL-2005] - Create table fails to write out a parquet file created from hive- read works fine
  • [DRILL-2006] - Implement text reader with advanced capabilities
  • [DRILL-2036] - select * query returns wrong result when column name in json file changes case
  • [DRILL-2073] - Filter on a field in a nested repeated type throws an exception
  • [DRILL-2074] - Queries fail with OutOfMemory Exception when Hash Join & Agg are turned off
  • [DRILL-2085] - Failed to propagate error
  • [DRILL-2091] - NPE in AbstractSqlAccessor
  • [DRILL-2093] - Columns of time and timestamp data type are not stored correctly in json format on CTAS
  • [DRILL-2140] - RPC Error querying JSON with empty nested maps
  • [DRILL-2141] - Data type error in group by and order by for JSON
  • [DRILL-2158] - Failure while attempting to start Drillbit in embedded mode.
  • [DRILL-2179] - better handle column called 'Timestamp'
  • [DRILL-2181] - Throw proper error message when flatten is used within an 'order by' or 'group by'
  • [DRILL-2201] - clear error message on join on complex type
  • [DRILL-2208] - Error message must be updated when query contains operations on a flattened column
  • [DRILL-2219] - Concurrent modification exception in TopLevelAllocator if a child allocator is added during loop in close()
  • [DRILL-2221] - CTAS (JSON) creates unreadable files when writing empty arrays
  • [DRILL-2228] - Projecting '*' returns all nulls when we have flatten in a filter and order by
  • [DRILL-2229] - SQL syntax errors should use SQLSyntaxErrorException
  • [DRILL-2232] - Flatten functionality not well defined when we use flatten in an order by without projecting it
  • [DRILL-2264] - Incorrect data when we use aggregate functions with flatten
  • [DRILL-2277] - COUNT(*) should return 0 instead of an empty result set when there are no records
  • [DRILL-2281] - Drill never returns when we use aggregate functions after a join with an order by
  • [DRILL-2292] - CTAS broken when we have repeated maps
  • [DRILL-2301] - Query fails when multiple table aliases are provided for CTEs
  • [DRILL-2340] - count(*) fails with subquery not containing limit
  • [DRILL-2350] - Star query failed with exception on JSON data
  • [DRILL-2376] - UNION ALL on Aggregates with GROUP BY returns incomplete results
  • [DRILL-2404] - After we cancel a query, DRILL sometimes hangs for the next query
  • [DRILL-2408] - CTAS should not create empty folders when underlying query returns no results
  • [DRILL-2411] - Scalar SUM/AVG over empty result set returns no rows instead of NULL
  • [DRILL-2423] - DROP VIEW against non-existent views fails with ZK error
  • [DRILL-2425] - Wrong results when identifier change cases within the same data file
  • [DRILL-2437] - enhance exception injection to support session level injections
  • [DRILL-2452] - ResultSet.getDouble should not throw an exception when the underlying type is a FLOAT
  • [DRILL-2476] - Handle IterOutcome.STOP in buildSchema()
  • [DRILL-2506] - IOOB with order by and limit
  • [DRILL-2511] - Assert with full outer join when one of the join predicates is of a required type (nullabe parquet)
  • [DRILL-2528] - Drill-JDBC-All Jar uses outdated classes
  • [DRILL-2532] - Glob not always fired for DFS storage engine
  • [DRILL-2533] - Metrics displayed in the profile UI should be rounded off instead of being truncated
  • [DRILL-2535] - Column labels on drill profile page are incorrect
  • [DRILL-2536] - Peak Mem column in the profile page displays 0 when value is less than 1MB
  • [DRILL-2545] - Killing a JDBC client program does not kill the query on drillbits
  • [DRILL-2548] - JDBC driver prints misleading SQL exception on getting record batches with no data
  • [DRILL-2552] - ZK disconnect to foreman node results in hung query on client
  • [DRILL-2554] - Incorrect results for repeated values when using jdbc
  • [DRILL-2569] - Minor fragmentId in Profile UI gets truncated to the last 2 digits
  • [DRILL-2570] - Broken JDBC-All Jar packaging can cause missing XML classes
  • [DRILL-2589] - Creating a view with duplicate column names should fail or give a warning to the user
  • [DRILL-2598] - Order by with limit on complex type throw IllegalStateException
  • [DRILL-2617] - Errors in the execution stack will cause DeferredException to throw an IllegalStateException
  • [DRILL-2624] - org.apache.drill.common.StackTrace prints garbage for line numbers
  • [DRILL-2662] - Exception type not being included when propagating exception message
  • [DRILL-2723] - Inaccurate row count estimate for text files results in BroadcastExchange
  • [DRILL-2750] - Running 1 or more queries against Drillbits having insufficient DirectMem renders the Drillbits in an unusable state
  • [DRILL-2753] - Implicit cast fails when comparing a double column and a varchar literal
  • [DRILL-2754] - Allocation bug in splitAndTransfer method causing some flatten queries to fail
  • [DRILL-2755] - Use and handle InterruptedException during query processing
  • [DRILL-2757] - Verify operators correctly handle low memory conditions and cancellations
  • [DRILL-2776] - querying a .json file that contains a repeated type returns the wrong results
  • [DRILL-2778] - Killing the drillbit which is the foreman results in hung sqlline
  • [DRILL-2780] - java.lang.IllegalStateException files open exceptions in drillbit.out
  • [DRILL-2793] - Killing a non foreman node results in direct memory being held on
  • [DRILL-2801] - ORDER BY produces extra records
  • [DRILL-2806] - Querying data from compressed csv file returns nulls and unreadable data
  • [DRILL-2809] - Increase the default value of partitioner_sender_threads_factor
  • [DRILL-2811] - Need option to specify Drillbit in the connection URI to connect to that specific Drillbit
  • [DRILL-2816] - system error does not display the original Exception message
  • [DRILL-2823] - Merge join should use implicit cast
  • [DRILL-2824] - Function resolution should be deterministic
  • [DRILL-2826] - Improve resilience to memory leaks and unclosed allocators
  • [DRILL-2841] - Web UI very slow in a multi node machine
  • [DRILL-2847] - DrillBufs from the RPC layer are being leaked
  • [DRILL-2848] - Disable decimal data type by default
  • [DRILL-2849] - Difference in query results over CSV file created by CTAS, compared to results over original CSV file
  • [DRILL-2865] - Drillbit runs out of memory on multiple consecutive CTAS
  • [DRILL-2870] - Fix return type of aggregate functions to be nullable
  • [DRILL-2871] - Plan for TPC-H 20 changed with DRILL-1384 (or DRILL-2761) causing performance degradation
  • [DRILL-2872] - Result from json file returns data from map type fields as "null"
  • [DRILL-2875] - IllegalStateException when querying the public yelp json dataset
  • [DRILL-2878] - FragmentExecutor.closeOutResources() is not called if an exception happens in the Foreman before the fragment executor starts running
  • [DRILL-2884] - Have cancel() cause "query canceled" rather than just "ResultSet closed"
  • [DRILL-2886] - JDBC driver doesn't detect lost connection
  • [DRILL-2887] - Fix bad applications of JdbcTest.connect()
  • [DRILL-2889] - Rename JdbcTest to JdbcTestBase.
  • [DRILL-2894] - FixedValueVectors shouldn't set it's data buffer to null when it fails to allocate it
  • [DRILL-2897] - Update Limit 0 to avoid parallelization
  • [DRILL-2904] - Fix wrong "before rows" message to "after rows" message
  • [DRILL-2907] - Drill performance degrades significantly over time - resource leak
  • [DRILL-2914] - regression: Mondrian query534.q, drill give wrong result
  • [DRILL-2921] - Query with a mix of distinct and not distinct scalar aggregates runs out of memory
  • [DRILL-2927] - Pending query in resource queue starts after timeout
  • [DRILL-2928] - C++ Client - io_service needs to be reset if it runs out of work
  • [DRILL-2932] - Error text reported via System.out.println rather than thrown SQLException's message
  • [DRILL-2934] - Exception when distinct aggregate is compared to numeric literal with decimal point
  • [DRILL-2936] - TPCH 4 and 18 SF100 hangs when hash agg is turned off
  • [DRILL-2940] - Large allocations are not released until GC kicks in
  • [DRILL-2942] - Allow Use of epoll RPC layer on Linux
  • [DRILL-2943] - Drill parsing error during deserialization for an Order-By
  • [DRILL-2944] - Switch to G1GC to reduce GC cpu overhead.
  • [DRILL-2951] - Tables are not visible when Drillbit is specified in the connection URL
  • [DRILL-2953] - Group By + Order By query results are not ordered.
  • [DRILL-2957] - Netty Memory Manager doesn't move empty chunks between lists
  • [DRILL-2959] - Compression codecs are leaking or slow to recapture memory
  • [DRILL-2960] - Default hive storage plugin missing from fresh drill install
  • [DRILL-2961] - Statement.setQueryTimeout() should throw a SQLException
  • [DRILL-2962] - Correlated subquery with scalar aggregate or scalar aggregate with expression throws and error
  • [DRILL-2963] - Exists with empty left batch causes IllegalStateException
  • [DRILL-2966] - HAVING clause with CASE statement with IN predicate causes assertion
  • [DRILL-2968] - crash on parquet file
  • [DRILL-2971] - If Bit<>Bit connection is unexpectedly closed and we were already blocked on writing to socket, we'll stay forever in ResettableBarrier.await()
  • [DRILL-2973] - Error messages not showing up in sqlline
  • [DRILL-2976] - Set default of extended JSON support for output to false until issues are resolved
  • [DRILL-2977] - In WorkManager, startFragmentPendingRemote() and addFragmentRunner() need to be permuted
  • [DRILL-2978] - FragmentManager is not removed from the WorkEventBus if it's FragmentExecutor is cancelled before it starts running
  • [DRILL-2979] - Storage HBase doesn't support customized hbase property zookeeper.znode.parent
  • [DRILL-2989] - TPCDS Query corrupts Drillbits and causing subsequent unrelated queries to hang (and timeout)
  • [DRILL-2993] - SQLLine hangs when we cancel a query in the middle of displaying results
  • [DRILL-2994] - Incorrect error message when disconnecting from server (using direct connection to drillbit)
  • [DRILL-2998] - Update C++ client to send/receive heartbeat message
  • [DRILL-3000] - I got JIRA report #3000. Now ... to use it for good or evil?
  • [DRILL-3001] - Some functional tests fail when new text reader is disabled
  • [DRILL-3005] - Spurious Error messages when using PrintingResultsListener
  • [DRILL-3006] - CTAS with interval data type creates invalid parquet file
  • [DRILL-3007] - Update Drill configuration settings to avoid mmap threshold increases on Linux
  • [DRILL-3009] - Reduce the IN list threshold to take advantage of Values operator
  • [DRILL-3010] - Convert bad command error messages into UserExceptions in SqlHandlers
  • [DRILL-3012] - Values Operator doesn't propagate operator id
  • [DRILL-3017] - NPE when cleaning up some RecordReader implementations
  • [DRILL-3018] - Queries with scalar aggregate and non equality (non correlated) fail to plan
  • [DRILL-3020] - Some exception message text not displayed in SQLLine, etc.; copy to thrown SQLException's message
  • [DRILL-3022] - Ensure sequential shutdown of Drillbits
  • [DRILL-3033] - Add memory leak fixes found so far in DRILL-1942 to 1.0
  • [DRILL-3037] - Unable to query on hdfs after moving to 0.9.0 version
  • [DRILL-3046] - Memory Leak after cancelling a query
  • [DRILL-3047] - Command failed while establishing connection
  • [DRILL-3048] - Disable assertions by default
  • [DRILL-3049] - Increase sort spooling threshold
  • [DRILL-3050] - Increase query context max memory
  • [DRILL-3051] - Integer overflow in TimedRunnable
  • [DRILL-3052] - canceling a fragment executor before it starts running will cause the Foreman to wait indefinitely for a terminal message from that fragment
  • [DRILL-3057] - A query that used to work before now fails in the optimizer
  • [DRILL-3058] - RemoteConnection of RPC double closes the connection
  • [DRILL-3061] - Fix memory leaks in TestDrillbitResilience
  • [DRILL-3062] - regression: Mondrian query447.q - lots of rows missing in result set
  • [DRILL-3063] - TestQueriesOnLargeFile leaks memory with 16M limit
  • [DRILL-3065] - Memory Leak at ExternalSortBatch
  • [DRILL-3066] - AtomicRemainder - Tried to close remainder, but it has already been closed.
  • [DRILL-3069] - Wrong result for aggregate query with filter on SF100
  • [DRILL-3070] - Memory Leak when we run out of memory
  • [DRILL-3071] - RecordBatchLoader#load leaks memory if an exception is thrown while loading the batch.
  • [DRILL-3072] - Profile UI fails to load when there is an empty json profile
  • [DRILL-3074] - ReconnectingClient.waitAndRun can stuck in infinite loop if it fails to establish the connection
  • [DRILL-3077] - sqlline's return code is 0 even when it force exits due to failed sql command
  • [DRILL-3079] - Move JSON Execution Plan parsing to FragmentExecutor
  • [DRILL-3080] - Error message is invalid if workload queue times out
  • [DRILL-3081] - Fix situation where Drill reports null <--> null in connection error
  • [DRILL-3085] - In ExternalSortBatch, Memory Leak in Runtime Generation Code
  • [DRILL-3087] - Union All returns incorrect results.
  • [DRILL-3088] - IllegalStateException: Cleanup before finished. 0 out of 1 strams have finished
  • [DRILL-3089] - Revert to 4 forked test and allow override from command line
  • [DRILL-3092] - Memory leak when an allocation fails near the creation of a RecordBatchData object
  • [DRILL-3093] - Leaking RawBatchBuffer
  • [DRILL-3098] - Set Unix style "line.separator" for tests
  • [DRILL-3099] - FileSelection's selectionRoot does not include the scheme and authority
  • [DRILL-3100] - TestImpersonationDisabledWithMiniDFS fails on Windows
  • [DRILL-3101] - Setting "slice_target" to 1 changes the order of the columns in a "select *" query with order by
  • [DRILL-3103] - EncoderException: RpcEncoder must produce at least one message.
  • [DRILL-3105] - OutOfMemoryError: GC overhead limit exceeded
  • [DRILL-3107] - Dynamic partition pruning fails on Windows (TestDirectoryExplorerUDFs)
  • [DRILL-3109] - Cancellation from sqlline is broken with the updated version
  • [DRILL-3110] - OutOfMemoryError causes memory accounting leak
  • [DRILL-3112] - Drill UI profile page shows exceptions where a long running query is submitted via the UI
  • [DRILL-3114] - Sqlline throws exception at launch
  • [DRILL-3115] - SQLLine colors do not work well with CYGWIN

Improvement

  • [DRILL-1662] - drillbit.sh stop should timeout
  • [DRILL-2433] - Implicit cast between date and timestamp is missing in joins
  • [DRILL-2508] - Add new column to sys.options table that exposes whether or not the current system value is the default
  • [DRILL-2602] - Throw an error on schema change during streaming aggregation
  • [DRILL-2697] - Pause injections should pause indefinitely until signalled
  • [DRILL-2725] - Faster work assignment logic
  • [DRILL-2772] - Display status of query when viewing the query's profile page
  • [DRILL-2946] - Tableau 9.0 Desktop Enablement Document
  • [DRILL-2955] - Enable color in sqlline for exceptions
  • [DRILL-2969] - Readers don't report number of records read in profile
  • [DRILL-2981] - Add simplified activity log
  • [DRILL-2982] - Tableau 9.0 Server Enablement Documentation
  • [DRILL-2984] - UserException is logging through its parent class logger
  • [DRILL-3027] - Add convenience methods to test builder for creating nested baseline values
  • [DRILL-3053] - add unchecked exception injection site in ChildAllocator
  • [DRILL-3084] - Add drill-* convenience methods for common cli startup commands

New Feature

  • [DRILL-1573] - Add configuration to skip header row in text files
  • [DRILL-2382] - enhance exception injection to support node-specific injections
  • [DRILL-2383] - add exception and pause injections for testing drillbit stability
  • [DRILL-2658] - Add ilike and regex substring functions
  • [DRILL-2958] - Move Drill to alternative cost-based planner for Join planning

Task

  • [DRILL-2316] - Docs Enhancement: Data Sources and File Formats, Basics Tutorial
  • [DRILL-2336] - configuration storage plugin docs update
  • [DRILL-2364] - JSON Data Model Reference 2nd draft
  • [DRILL-2381] - write lexical structures section, JSON/Parquet reference fixes, updates to data types
  • [DRILL-2397] - Enhance SQL Ref Data Types docs
  • [DRILL-2736] - review feedback on multitenancy and user auth

You can now download Drill 1.0.0.