RDBMS Storage Plugin

Apache Drill supports querying a number of RDBMS instances. This allows you to connect your traditional databases to your Drill cluster so you can have a single view of both your relational and NoSQL datasources in a single system.

As with any source, Drill supports joins within and between all systems. Drill additionally has powerful pushdown capabilities with RDBMS sources. This includes support to push down join, where, group by, intersect and other SQL operations into a particular RDBMS source (as appropriate).

Using the RDBMS Storage Plugin

Drill is designed to work with any relational datastore that provides a JDBC driver. Drill is actively tested with PostgreSQL, MySQL, Oracle, MSSQL, Apache Derby and H2. For each system, you will follow three basic steps for setup:

  1. Install Drill, if you do not already have it installed.
  2. Copy your database’s JDBC driver into the jars/3rdparty directory. (You’ll need to do this on every node.)
  3. Restart Drill. See Starting Drill in Distributed Mode.
  4. Add a new storage configuration to Drill through the Web UI. Example configurations for Oracle, SQL Server, MySQL and PostgreSQL are provided below.

Setting data source parameters in the storage plugin configuration

Introduced in release: 1.18

A JDBC storage plugin configuration property sourceParameters was introduced to allow setting data source parameters supported by HikariCP. Parameters names with incorrect naming and parameter values which are of incorrect data type or illegal will cause the storage plugin to fail to start. See the Example of PostgreSQL Configuration with sourceParameters configuration property section for the example of usage.

From release 1.20 onwards, the default sourceParameters set on the JDBC storage plugin are set to make the initiation of outbound connections lazy rather than eager, where eager means at the time of storage config enablement. Additionally, the JDBC connection pool is now allowed by default to shrink back down to empty if all of its connections are idle. These new settings are visible in the rdbms storage config template and may be overridden to reinstate Drill’s earlier behaviour.

It is also possible to set JDBC driver connection properties (which are distinct from HikariCP’s own properties) using sourceParameters by prepending the property name with dataSource.. For example, to set the connection property S3OutputLocation, which is supported by the AWS Athena JDBC driver, add a property named dataSource.S3OutputLocation to sourceParameters. Refer to the HikariCP docs for more detail.

Example: Working with MySQL

Drill communicates with MySQL through the JDBC driver using the configuration that you specify in the Web UI or through the REST API.

Note

Verify that MySQL is running and the MySQL driver is in place before you configure the JDBC storage plugin.

To configure the JDBC storage plugin:

  1. Start the Drill shell.
  2. Start the Web UI.
  3. On the Storage tab, enter a name in New Storage Plugin. For example, enter myplugin. Each configuration registered with Drill must have a distinct name. Names are case-sensitive.
  4. Click Create.
  5. In Configuration, set the required properties using JSON formatting as shown in the following example. Change the properties to match your environment.
    {
      "type": "jdbc",
      "driver": "com.mysql.jdbc.Driver",
      "url": "jdbc:mysql://localhost:3306",
      "username": "root",
      "password": "mypassword",
      "enabled": true
    }
    

Note

The JDBC URL may differ depending on your installation and configuration. See the example configurations below for examples.

You can use the performance_schema database, which is installed with MySQL to query your MySQL performance_schema database. Include the names of the storage plugin configuration, the database, and table in dot notation the FROM clause as follows:

  0: jdbc:drill:zk=local> select * from myplugin.performance_schema.accounts;
  |--------|------------|----------------------|--------------------|
  |  USER  |    HOST    | CURRENT_CONNECTIONS  | TOTAL_CONNECTIONS  |
  |--------|------------|----------------------|--------------------|
  | null   | null       | 18                   | 20                 |
  | jdoe   | localhost  | 0                    | 813                |
  | root   | localhost  | 3                    | 5                  |
  |--------|------------|----------------------|--------------------|
  3 rows selected (0.171 seconds)

Example Configurations

ClickHouse

Download and install the official ClickHouse JDBC driver on all of the nodes in your cluster.

{
	"type": "jdbc",
	"enabled": true,
	"driver": "ru.yandex.clickhouse.ClickHouseDriver",
	"url": "jdbc:clickhouse://1.2.3.4:8123.default",
	"username": "user",
	"password": "password"
}

MySQL

For MySQL, Drill has been tested with MySQL’s mysql-connector-java-5.1.37-bin.jar driver. Copy this to all nodes.

{
  "type": "jdbc",
  "enabled": true,
  "driver": "com.mysql.jdbc.Driver",
  "url": "jdbc:mysql://1.2.3.4",
  "username": "user",
  "password": "password"
}

Oracle Database

Download and install Oracle’s Thin ojdbc7.12.1.0.2.jar driver and copy it to all nodes in your cluster.

{
  "type": "jdbc",
  "enabled": true,
  "driver:" "oracle.jdbc.OracleDriver",
  "url:" "jdbc:oracle:thin:user/password@1.2.3.4:1521/ORCL"
}

PostgreSQL

Drill is tested with the PostgreSQL driver version 42.2.11 (any recent driver should work). Download and copy this driver jar to the jars/3rdparty folder on all nodes.

Note

You'll need to provide a database name as part of your JDBC connection string for Drill to correctly expose PostgreSQL tables.

{
  "type": "jdbc",
  "enabled": true,
  "driver": "org.postgresql.Driver",
  "url": "jdbc:postgresql://1.2.3.4/mydatabase",
  "username": "user",
  "password": "password"
}

You may need to qualify a table name with a schema name for Drill to return data. For example, when querying a table named ips, you must issue the query against public.ips, as shown in the following example:

   0: jdbc:drill:zk=local> use pgdb;
   |------|----------------------------------|
   | ok   | summary                          |
   |------|----------------------------------|
   | true | Default schema changed to [pgdb] |
   |------|----------------------------------|

   0: jdbc:drill:zk=local> show tables;
   |--------------|--------------|
   | TABLE_SCHEMA | TABLE_NAME   |
   |--------------|--------------|
   | pgdb.test    | ips          |
   | pgdb.test    | pg_aggregate |
   | pgdb.test    | pg_am        |
   |--------------|--------------|

   0: jdbc:drill:zk=local> select * from public.ips;
   |------|---------|
   | ipid | ipv4dot |
   |------|---------|
   | 1    | 1.2.3.4 |
   | 2    | 1.2.3.5 |
   |------|---------|

Example of PostgreSQL configuration with sourceParameters configuration property

{
  "type": "jdbc",
  "enabled": true,
  "driver": "org.postgresql.Driver",
  "url": "jdbc:postgresql://1.2.3.4/mydatabase?defaultRowFetchSize=2",
  "username": "user",
  "password": "password",
  "sourceParameters": {
    "minimumIdle": 0,
    "autoCommit": true,
    "connectionTestQuery": "select version() as postgresql_version",
    "dataSource.cachePrepStmts": true,
    "dataSource.prepStmtCacheSize": 250
  }
}

MS SQL Server

For SQL Server, Drill has been tested with Microsoft’s sqljdbc41.4.2.6420.100.jar driver. Copy this jar file to all Drillbits.

Note

You'll need to provide a database name as part of your JDBC connection string for Drill to correctly expose MSSQL schemas.

{
  "type": "jdbc",
  "enabled": true,
  "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
  "url": "jdbc:sqlserver://1.2.3.4:1433;databaseName=mydatabase",
  "username": "user",
  "password": "password"
}