Plugin Configuration Basics

Aug 8, 2017

When you add or update storage plugin configurations on one Drill node in a cluster having multiple installations of Drill, Drill broadcasts the information to other Drill nodes to synchronize the storage plugin configurations. You do not need to restart any of the Drillbits when you add or update a storage plugin configuration.

Using the Drill Web Console

You can use the Drill Web Console to update or add a new storage plugin configuration. The Drill shell needs to be running to start the Web Console.

To create a name and new configuration:

  1. Start the Drill shell.
  2. Start the Web Console.
  3. On the Storage tab, enter a name in New Storage Plugin. Each configuration registered with Drill must have a distinct name. Names are case-sensitive.
    sandbox plugin

    Note

    The URL differs depending on your installation and configuration.

  4. Click Create.

  5. In Configuration, use JSON formatting to modify a copy of an existing configuration if possible.
    Using a copy of an existing configuration reduces the risk of JSON coding errors. Use the Storage Plugin Attributes table in the next section as a guide for making typical modifications.

  6. Click Create.

Storage Plugin Attributes

The following graphic shows key attributes of a typical dfs-based storage plugin configuration:
dfs plugin

List of Attributes and Definitions

The following table describes the attributes you configure for storage plugins installed with Drill.

Attribute Example Values Required Description
"type" "file"
"hbase"
"hive"
"mongo"
yes A valid storage plugin type name.
"enabled" true
false
yes State of the storage plugin.
"connection" "classpath:///"
"file:///"
"mongodb://localhost:27017/"
"hdfs://"
implementation-dependent The type of distributed file system, such as HDFS, Amazon S3, or files in your file system, and an address/path name.
"workspaces" null
"logs"
no One or more unique workspace names. If a workspace name is used more than once, only the last definition is effective.
"workspaces". . . "location" "location": "/Users/johndoe/mydata"
"location": "/tmp"
no Full path to a directory on the file system.
"workspaces". . . "writable" true
false
no One or more unique workspace names. If defined more than once, the last workspace name overrides the others.
"workspaces". . . "defaultInputFormat" null
"parquet"
"csv"
"json"
no Format for reading data, regardless of extension. Default = "parquet"
"formats" "pcap"
"psv"
"csv"
"tsv"
"parquet"
"json"
"avro"
"maprdb"
"sequencefile"
yes One or more valid file formats for reading. Drill detects formats of some files; others require configuration. The maprdb format is in installations of the mapr-drill package.
"formats" . . . "type" "pcap"
"text"
"parquet"
"json"
"maprdb"
"avro"
"sequencefile"
yes Format type. You can define two formats, csv and psv, as type "Text", but having different delimiters.
formats . . . "extensions" ["csv"] format-dependent File name extensions that Drill can read.
"formats" . . . "delimiter" "\n"
"\r"
"\t"
"\r\n"
","
format-dependent Sequence of one or more characters that signifies the end of a line of text and the start of a new line in a delimited text file, such as CSV. Drill treats \n as the standard line delimiter. As of Drill 1.8, Drill supports multi-byte delimiters, such as \r\n. Use a 4-digit hex code syntax \uXXXX for a non-printable delimiter.
"formats" . . . "quote" """ no A single character that starts/ends a value in a delimited text file.
"formats" . . . "escape" "`" no A single character that escapes a quotation mark inside a value.
"formats" . . . "comment" "#" no The line decoration that starts a comment line in the delimited text file.
"formats" . . . "skipFirstLine" true no To include or omit the header when reading a delimited text file. Set to true to avoid reading headers as data.
"formats" . . . "extractHeader" true no Set to true to extract and use headers as column names when reading a delimited text file, false otherwise. Ensure skipFirstLine is not true when extractHeader=false.

Using the Formats Attributes

You set the formats attributes, such as skipFirstLine, in the formats area of the storage plugin configuration. When setting attributes for text files, such as CSV, you also need to set the sys.options property exec.storage.enable_new_text_reader to true (the default). For more information and examples of using formats for text files, see "Text Files: CSV, TSV, PSV".

Using the Formats Attributes as Table Function Parameters

In Drill version 1.4 and later, you can also set the formats attributes defined above on a per query basis. To pass parameters to the format plugin, use the table function syntax:

select a, b from table({table function name}(parameters))

The table function name is the table name, the type parameter is the format name, and the other parameters are the fields that the format plugin configuration accepts, as defined in the table above (except for extensions which do not apply in this context).

For example, to read a CSV file and parse the header:
select a, b from table(dfs.`path/to/data.csv`(type => 'text', fieldDelimiter => ',', extractHeader => true))

For more information about format plugin configuration see "Text Files: CSV, TSV, PSV".

Using Other Attributes

The configuration of other attributes, such as size.calculator.enabled in the hbase plugin and configProps in the hive plugin, are implementation-dependent and beyond the scope of this document.

Case-sensitive Names

As previously mentioned, workspace and storage plugin names are case-sensitive. For example, the following query uses a storage plugin name dfs and a workspace name clicks. When you refer to dfs.clicks in an SQL statement, use the defined case:

0: jdbc:drill:> USE dfs.clicks;

For example, using uppercase letters in the query after defining the storage plugin and workspace names using lowercase letters does not work.

Storage Plugin REST API

If you need to add a storage plugin configuration to Drill and do not want to use a web browser, you can use the Drill REST API to create a storage plugin configuration. Use a POST request and pass two properties:

  • name
    The storage plugin configuration name.

  • config
    The attribute settings as entered in the Web Console.

For example, this command creates a storage plugin named myplugin for reading files of an unknown type located on the root of the file system:

curl -X POST -H "Content-Type: application/json" -d '{"name":"myplugin", "config": {"type": "file", "enabled": false, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultInputFormat": null}}, "formats": null}}' http://localhost:8047/storage/myplugin.json

This example assumes HTTPS has not been enabled.

Bootstrapping a Storage Plugin

The REST API is recommended for programmatically adding a storage plugin configuration to Drill. An alternative for use in a distributed environment only is bootstrapping. You can create a bootstrap-storage-plugins.json file and include it on the classpath when starting Drill. The storage plugin configuration loads when Drill starts up.

Currently, bootstrapping a storage plugin configuration works only when the first Drillbit in the cluster first starts up. The configuration is stored in ZooKeeper, preventing Drill from picking up the bootstrap-storage-plugins.json again.

After cluster startup, you have to use the REST API or Drill Web Console to add a storage plugin configuration. Alternatively, you can modify the entry in ZooKeeper by uploading the json file for that plugin to the /drill directory of the zookeeper installation, or by just deleting the /drill directory if you do not have configuration properties to preserve.

If you load an HBase storage plugin configuration using bootstrap-storage-plugins.json file and HBase is not installed, you might experience a delay when executing the queries. Configure the HBase client timeout and retry settings in the config block of the HBase plugin configuration.