RDBMS Metastore

The RDBMS Metastore implementation allows you store Drill Metastore metadata in a configured RDBMS.

Configuration

Currently, the RDBMS Metastore is not the default implementation. To enable the RDBMS Metastore create the drill-metastore-override.conf file in your config directory and specify the RDBMS Metastore class:

drill.metastore: {
  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore"
}

Connection properties

Use the connection properties to specify how Drill should connect to your Metastore database.

drill.metastore.rdbms.data_source.driver - driver class name. Required. Note: the driver class must be included into the Drill classpath. The easiest way to do that is to put the driver jar file into the $DRILL_HOME/jars/3rdparty folder. Or, to make upgrades easier, in your $DRILL_SITE/jars folder. Drill includes the driver for SQLite.

drill.metastore.rdbms.data_source.url - connection url. Required.

drill.metastore.rdbms.data_source.username - database user on whose behalf the connection is being made. Optional, if database does not require user to connect.

drill.metastore.rdbms.data_source.password - database user’s password. Optional, if database does not require user’s password to connect.

drill.metastore.rdbms.data_source.properties - specifies properties which will be used during data source creation. See list of available Hikari properties for more details.

Default configuration

Out of the box, the Drill RDBMS Metastore is configured to use the embedded file system based SQLite database. It will be created locally in user’s home directory under ${drill.exec.zk.root}"/metastore location.

Default setup can be used only in Drill embedded mode. SQLite is an embedded database; is not distributed. SQLite is good for trying out the feature, for testing, for a running Drill in embedded mode, and perhaps for a single-node Drill “cluster”. If should not be used in a multi-node cluster. Each Drillbit will have its own version of the truth and behavior will be undefined and incorrect.

Custom configuration

drill-metastore-override.conf is used to customize connection details to the Drill Metastore database. See drill-metastore-override-example.conf for more details.

Example of PostgreSQL configuration

drill.metastore: {
  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore",
  rdbms: {
    data_source: {
      driver: "org.postgresql.Driver",
      url: "jdbc:postgresql://localhost:1234/mydb?currentSchema=drill_metastore",
      username: "user",
      password: "password"
    }
  }
}

Note: as mentioned above, the PostgreSQL JDBC driver must be present in the Drill classpath.

Example of MySQL configuration

drill.metastore: {
  implementation.class: "org.apache.drill.metastore.rdbms.RdbmsMetastore",
  rdbms: {
    data_source: {
      driver: "com.mysql.cj.jdbc.Driver",
      url: "jdbc:mysql://localhost:1234/drill_metastore",
      username: "user",
      password: "password"
    }
  }
}

Note: as mentioned above, the MySQL JDBC driver must be present in the Drill classpath.

Note

For MySQL connector version 6+, use the com.mysql.cj.jdbc.Driver driver class, for older versions use the com.mysql.jdbc.Driver.

Tables structure

The Drill Metastore stores several types of metadata, called components. Currently, only the tables component is implemented. The tables component provides metadata about Drill tables, including their segments, files, row groups and partitions. In Drill tables component unit is represented by TableMetadataUnit class which is applicable to any metadata type. The TableMetadataUnit class holds fields for all five metadata types within the tables component. Any fields not applicable to a particular metadata type are simply ignored and remain unset.

In the RDBMS implementation of the Drill Metastore, the tables component includes five tables, one for each metadata type. The five tables are: TABLES, SEGMENTS, FILES, ROW_GROUPS, and PARTITIONS. See src/main/resources/db/changelog/changes/initial_ddls.yaml for the schema and indexes of each table.

The Drill Metastore API has the following semantics:

  • most of the time all data about component is accessed;
  • data is filtered by non-complex fields, like storage plugin, workspace, table name, etc;
  • data is overwritten fully, there is no update by certain fields.

Taking into account the Drill Metastore API semantics, the RDBMS Drill Metastore schema is slightly denormalized. Having normalized structure would lead to unnecessary joins during select, index re-indexing during update.

Table creation

The RDBMS Metastore uses Liquibase to create the needed tables during the RDBMS Metastore initialization. Users should not create any tables manually.

Database schema

Liquibase uses a yaml configuration file to apply changes to the database schema: src/main/resources/db/changelog/changelog.yaml. Liquibase converts the yaml specification into the DDL / DML commands suitable required for the configured database. See list of supported databases: https://www.liquibase.org/databases.html.

The Drill Metastore tables are created in the database schema indicated in the connection URL. This will be the default schema unless you specify a different schema. Drill will not create the schema, however. Best practice is to create a schema within your database for the Drill metastore before initializing the Metastore.

Example:

PostgreSQL: jdbc:postgresql://localhost:1234/mydb?currentSchema=drill_metastore

MySQL: jdbc:mysql://localhost:1234/drill_metastore

Since Drill will create the required tables, ensure that the database user has the following permissions in the metastore schema:

  • read and write tables;
  • create and modify database objects (tables, indexes, views, etc.).

Liquibase tables

During Drill RDBMS Metastore initialization, Liquibase will create two internal tracking tables: DATABASECHANGELOG and DATABASECHANGELOGLOCK. They are needed to track schema changes and concurrent updates. See https://www.liquibase.org/get_started/how-lb-works.html for more details.

Query execution

SQL queries issued to RDBMS Metastore tables are generated using JOOQ. Drill uses the open-source version of JOOQ to generate the queries sent to the configured Metastore database.

JOOQ generates SQL statements based on SQL dialect determined by database connection details. List of supported dialects: https://www.jooq.org/javadoc/3.13.x/org.jooq/org/jooq/SQLDialect.html. Note: dialects annotated with @Pro are not supported, since open-source version of JOOQ is used.

Supported databases

The RDBMS Metastore was tested with SQLite, PostreSQL and MySQL. Other databases should also work if there is Liquibase and JOOQ support for them.