Drill Introduction

Drill is an Apache open-source SQL query engine for Big Data exploration. Drill is designed from the ground up to support high-performance analysis on the semi-structured and rapidly evolving data coming from modern Big Data applications, while still providing the familiarity and ecosystem of ANSI SQL, the industry-standard query language. Drill provides plug-and-play integration with existing Apache Hive and Apache HBase deployments.

Apache Drill Key Features

Key features of Apache Drill are:

Low-latency SQL queries
Dynamic queries on self-describing data in files (such as JSON, Parquet, text) and HBase tables, without requiring metadata definitions in the Hive metastore.
ANSI SQL
Nested data support
Integration with Apache Hive (queries on Hive tables and views, support for all Hive file formats and Hive UDFs)
BI/SQL tool integration using standard JDBC/ODBC drivers

Quick Links

If you’ve never used Drill, visit these links to get a jump start:

What’s New in Apache Drill 1.19

DRILL-92 - Cassandra Storage Plugin
DRILL-3637 - Elasticsearch Storage Plugin
DRILL-7823 - XML Storage Plugin
DRILL-7751 - Splunk Storage Plugin
DRILL-5940 - Avro with schema registry support for Kafka
DRILL-7855 - Secure mechanism for specifying storage plugin credentials
DRILL-7921 - Linux ARM64 based system support
DRILL-6953 - Rowset based JSON reader
DRILL-7733 - Use streaming for REST JSON queries
Several plugins have been converted to the Enhanced Vector Framework (EVF)
- DRILL-7525 - Convert SequenceFiles to EVF
- DRILL-7532 - Convert SysLog to EVF
- DRILL-7533 - Convert Pcapng to EVF
- DRILL-7534 - Convert HTTPD format plugin to EVF
- DRILL-7533 - Convert Image Format to EVF

What’s New in Apache Drill 1.18

DRILL-6552 - Drill Metadata management “Drill Metastore”
DRILL-7233 - Format Plugin for HDF5
DRILL-7359 - Add support for DICT type in RowSet Framework
DRILL-7437 - Storage Plugin for Generic HTTP REST API
DRILL-7607 - Dynamic credit based flow control
DRILL-7656 - Support injecting BufferManager into UDF
DRILL-7706 - Drill RDBMS Metastore

What’s New in Apache Drill 1.17

DRILL-6540 - Upgrade to HADOOP-3.0 libraries. The hadoop-winutils version that worked for previous releases does not work with Drill 1.17 and later. Use the hadoop-winutils version provided with Drill 1.17 or use custom hadoop-winutils built for Hadoop 3.2.0.
DRILL-6739 - Update Kafka libs to 2.0.0+ version
DRILL-7401 - Upgrade to Sqlline 1.9
DRILL-7200 - Update Calcite to 1.19.0 / 1.20.0
DRILL-5674 - Support for .zip compression
DRILL-6835 - Schema provision using File / Table Function
DRILL-7337 - Support for vararg UDFs
DRILL-7096 - Develop vector for canonical Map<K,V>
DRILL-7343 - User-Agent UDFs added

Hive complex types support:

DRILL-7251 - Read Hive array without nulls
DRILL-7252 - Read Hive map using Dict<K,V> vector
DRILL-7253 - Read Hive struct without nulls
DRILL-7254 - Read Hive union without nulls
DRILL-7268 - Read Hive array with parquet native reader

New format plugins support:

DRILL-4303 - ESRI Shapefile (shp) format plugin
DRILL-7177 - Format Plugin for Excel Files
DRILL-6096 - Provide mechanisms to specify field delimiters and quoted text for TextRecordWriter
Parquet format improvements, including runtime row group pruning (DRILL-7062), empty parquet creation (DRILL-7156), reading (DRILL-4517) support, and more.

Metastore support:

DRILL-7272 - Implement Drill Iceberg Metastore plugin
DRILL-7273 - Create operator for handling metadata
DRILL-7357 - Expose Drill Metastore data through INFORMATION_SCHEMA

What’s New in Apache Drill 1.16

ANALYZE TABLE statement to computes statistics on Parquet data (DRILL-1328)
CREATE OR REPLACE SCHEMA command to define a schema for text files (DRILL-6964)
REFRESH TABLE METADATA command can generate metadata cache files for specific columns (DRILL-7058)
SYSLOG (RFC-5424) Format Plugin (DRILL-6582)
NEAREST DATE function to facilitate time series analysis (DRILL-7077)
Format plugin for LTSV files (DRILL-7014)
Ability to query Hive views, like querying Hive tables in a hive schema, for example SELECT * FROM hive.hive_view`; (DRILL-540)
Upgrade to SQLLine 1.7 changes the default prompt to apache drill (schema_name)> or you can define a custom prompt using the command !set prompt <new-prompt>. (DRILL-6989)
Calcite updated to version 1.18.0 (DRILL-6862)
Several Drill Web UI improvements, including:

What’s New in Apache Drill 1.15

Drill can leverage indexes to create index-based query plans. (DRILL-6381)
Support for aliases in the GROUP BY clause. (DRILL-1248)
CROSS JOIN support. (DRILL-786)
The INFORMATION_SCHEMA contains a FILES table that you can query for information about directories and files. (DRILL-6680)
System functions table that exposes the available SQL functions in Drill and also detects UDFs that have been dynamically loaded into Drill. (DRILL-3988)
New system options table. (DRILL-6684)
Support for TIMESTAMPADD and TIMESTAMPDIFF datetime functions. (DRILL-3610)
Ability to secure znodes with custom ACLs (Access Control Lists) (DRILL-5671).
All cast and data type conversion functions return null for an empty string (‘’) when the drill.exec.functions.cast_empty_string_to_null option is enabled. (DRILL-6817)
Storage plugin names are case-insensitive. (DRILL-6492)
Ability to access your AWS access key ID and secret access key using the Credential Provider API for the S3 storage plugin. (DRILL-6662)
Upgrade to SQLLine 1.6 includes the ability to add custom configuration. (DRILL-3853)
New SQLLine connection parameters (DRILL-3933)
New option, exec.query.return_result_set_for_ddl, prevents Drill from returning a result set for DDL statements when set to “false.” Useful for clients tools that connect to Drill (via JDBC) if they do not expect a result set. (DRILL-6834)
Parquet filter pushdown for VARCHAR and DECIMAL data types (DRILL-6744)
Improved query performance with the semi-join functionality inside the Hash-Join operator. (DRILL-6735)
Lateral join functionality is enabled by default. (DRILL-6729)
Support JPPD (Join Predicate Push Down). DRILL-6385
Multiple Web UI improvements to simplify the use of options and submit queries, including:
- Search field
- Quick Filters (DRILL-5735)
- Default button (DRILL-6668)
- Web display options (DRILL-6544)
- Meta+Enter key combination to submit queries (DRILL-6611)

What’s New in Apache Drill 1.14

Ability to run Drill in a Docker container. (DRILL-6346)
Ability to export and save your storage plugin configurations to a JSON file for reuse. (DRILL-4580)
Ability to manage storage plugin configurations in the Drill configuration file, storage-plugins-override.conf. (DRILL-6494)
Functions that return data type information. (DRILL-6361)
The Drill kafka storage plugin supports filter pushdown for query conditions on certain Kafka metadata fields in messages. (DRILL-5977)
Spill to disk for the Hash Join operator. (DRILL-6027)
The dfs storage plugin supports a Logfile plugin extension that enables Drill to directly read and query log files of any format. (DRILL-6104)
Phonetic and string distance functions. (DRILL-6519)
The store.hive.conf.properties option enables you to specify Hive properties at the session level using the SET command. (DRILL-6575)
Drill can directly manage the CPU resources through the Drill start-up script, drill-env.sh; you no longer have to manually add the PID to the cgroup.procs file each time a Drillbit restarts. (DRILL-143)
Drill can query the metadata in various image formats with the image metadata format plugin. (DRILL-4364)
Enhanced decimal data type support. (DRILL-6094)
Option to push LIMIT(0) on top of SCAN. (DRILL-6574)
Parquet filter pushdown improvements: - Drill can infer filter conditions for join queries and push the filter conditions down to the data source. (DRILL-6173) - Drill uses a native reader to read Hive tables when you enable the store.hive.optimize_scan_with_native_readers option. When enabled, Drill reads data faster and applies filter pushdown optimizations. (DRILL-6331)
Early release of lateral join. (DRILL-5999)

What’s New in Apache Drill 1.13

JDK 8 support. (DRILL-1491)
Upgrade to Calcite version 1.15. (DRILL-3993)
JDBC Statement.setQueryTimeout(int) support to cancel queries if they do not complete within the specified time. (DRILL-3640)
Batch processing improvements that enable you to limit the amount of memory that the Flatten, Merge Join, and External Sort operators allocate to outgoing batches. (DRILL-6123)
Enhanced DESCRIBE command. (DRILL-4559)
Support for SPNEGO to extend Kerberos to Web applications through HTTP. (DRILL-5425)
Ability to run Drill under YARN. (DRILL-1170)
Parquet filter pushdown support for IS [NOT] NULL, TRUE, and FALSE operators and implicit and explicit casts for timestamp, date, and time data types. (DRILL-6174)
Performance improvements with support for project push down, filter push down, and partition pruning on dynamically expanded columns when represented as a star in the ITEM operator. (DRILL-6118)
Updated Hive libraries and the Drill Hive client updated to 2.3.2 with support for querying Hive transactional ORC bucketed tables. (DRILL-5978)
Ability to automatically manage memory allocations during Drill startup. (DRILL-5741)
Ability to query an empty directory and use it for queries with any JOIN and UNION (UNION ALL) operators. (Drill-4185)
Non-numeric support for JSON processing. (Drill-5919)
New options to that enable you to configure the number of Jetty acceptors and selectors (DRILL-5994)
Support SQL syntax highlighting of queries, auto-complete support in SQL editors, and snippets. (DRILL-5868)
Improved performance of the Single Merge Exchange operator. (DRILL-6115)
Like operator optimization. DRILL-5879
User/Distribution-specific configuration checks during startup (DRILL-5741).

What’s New in Apache Drill 1.12

Drill 1.12 provides the following new features and improvements:

Kafka and OpenTSDB storage plugins (DRILL-4779, DRILL-5337)
SSL/TLS support (DRILL-5431)
Network encryption support (DRILL-5682)
Queue-based memory assignment for buffering operators (DRILL-5716)
A collection of networking functions that facilitate network analysis using Drill (DRILL-5834)
Support for the libpam4j PAM authenticator (DRILL-5820)
Filter pushdown for Parquet can handle files with multiple rowgroups (DRILL-5795)
UTF-8 is enabled in the query string by default (DRILL-5772)
IF NOT EXISTS support for CREATE TABLE and CREATE VIEWS (DRILL-5952)
Geometry functions, ST_AsGeoJSON and ST_AsJSON, that return GeoJSON and JSON representations (DRILL-5962, DRILL-5960)
JMX metrics for failed and canceled queries (DRILL-5909)
Syntax highlighting and error checking for storage plugin configurations (DRILL-5981)
System options improvements, including a new internal system options table (DRILL-5723)
Ability to prevent users from accessing a path outside the current workspace (DRILL-5964)
Ability to put the server in quiescent mode for a graceful shutdown (DRILL-4286)
The Drill Web UI lists the completion of successfully completed queries as “successful” (DRILL-5923)

What’s New in Apache Drill 1.11

Drill 1.11 provides the following new features and improvements:

Cryptography-related functions. (DRILL-5634)
Spill to disk for the hash aggregate operator. (DRILL-5457)
Format plugin support for PCAP files. (DRILL-5432)
Ability to change the HDFS block Size for Parquet files. (DRILL-5379)
Ability to store query profiles in memory. (DRILL-5481)
Configurable CTAS directory and file permissions option. (DRILL-5391)
Support for network encryption. (DRILL-4335)
Relative paths stored in the metadata file. (DRILL-3867)
Support for ANSI_QUOTES. (DRILL-3510)

What’s New in Apache Drill 1.10

Drill 1.10 provides the following new features and improvements:

Support for the CREATE TEMPORARY TABLE AS (CTTAS) command.
A JDBC connection option that improves fault tolerance when connecting directly to a Drill node from a client.
The Web UI displays the Drill version and additional query profile statistics.
Drill implicitly interprets the INT96 timestamp data type in Parquet files.
Support for Kerberos authentication between the client and drillbit.

What’s New in Apache Drill 1.9

Drill 1.9 provides the following new features:

Asynchronous Parquet reader
Parquet filter pushdown
Dynamic UDF support
HTTPD format plugin

What’s New in Apache Drill 1.8

Drill 1.8 provides the following new features and changes:

Metadata cache pruning
IF EXISTS parameter with the DROP TABLE and DROP VIEW commands
DESCRIBE SCHEMA command
Multi-byte delimiter support
New parameters for filter selectivity estimates
Changes to the configuration and launch scripts - See Configuration and Launch Script Changes

What’s New in Apache Drill 1.7

Drill 1.7 provides the following new features:

Monitoring via JMX
Hive CHAR data type support
HBase 1.x support

What’s New in Apache Drill 1.6

Drill 1.6 provides the following new features:

Inbound impersonation
Additional custom window frames

What’s New in Apache Drill 1.5

Drill 1.5 provides the following new features:

Authentication and security for the Web interface and REST API
Experimental query support for Apache Kudu (incubating)
An improved memory allocator
Configurable caching for Hive metadata

What’s New in Apache Drill 1.4

Drill 1.4 introduces the following improvements:

select with options that you use in queries to change storage plugin settings
Improved behavior when parsing CSV file header names
A variable to set non-pretty, such as compact, printing of JSON
Better drillbit.log files that include query text

Drill 1.4 fixes an error that occurred when you query a Hive table using the HBaseStorageHandler (DRILL-3739). To successfully query a Hive table using the HBaseStorageHandler, you need to configure the Hive storage plugin as described in the Hive storage plugin documentation.

What’s New in Apache Drill 1.3

This releases fix issues and add a number of enhancements, including the following ones:

Enhanced Amazon S3 support
Hetrogeneous types Support for columns that evolve from one data type to another over time.
Text file headers
Sequence files support
Enhancements related to querying Hive tables, MongoDB collections, and Avro files

What’s New in Apache Drill 1.2

This release of Drill fixes many issues and introduces a number of enhancements, including the following ones:

Support for JDBC data sources, such as MySQL, through a new JDBC Storage plugin
Improvements in the Drill JDBC driver including inclusion of Javadocs and better application dependency compatibility
Enhancements to Avro file formats
- Support for complex data types, such as UNION and MAP
- Optimized Avro file processing (block-wise)
Partition pruning improvements
A number of new SQL window functions
- NTILE
- LAG and LEAD
- FIRST_VALUE and LAST_VALUE
HTTPS support for Web UI operations
Performance improvements for querying HBase, which includes leveraging ordered byte encoding
Optimized reads of Parquet-backed, Hive tables
Read support for the Parquet INT96 type and a new TIMESTAMP_IMPALA type used with the CONVERT_FROM function decodes a timestamp from Hive or Impala.
Parquet metadata caching to improve query performance on a large number of files
DROP TABLE command
Improved correlated subqueries
Union Distinct
Improved LIMIT processing

What’s New in Apache Drill 1.1

Many enhancements in Apache Drill 1.1 include the following key features:

SQL window functions
Partitioning data using the new PARTITION BY clause in the CTAS command
Delegated Hive impersonation
Support for UNION and UNION ALL and better optimized plans that include UNION.

What’s New in Apache Drill 1.0

Apache Drill 1.0 offers the following new features:

Many performance planning and execution improvements.
Updated Drill shell now formats query results.
Query audit logging for getting the query history on a Drillbit.
Improved connection handling.
New Errors tab in the Query Profiles UI that facilitates troubleshooting and distributed storing of profiles.
Support for a new storage plugin input format: Avro

In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. You can enable the DECIMAL type, but this is not recommended.

← Using Saiku Analytics with Apache DrillWhy Drill →