Plugin Configuration Basics
There are several ways you can configure storage plugins. For example, you can configure storage plugins in the Drill Web UI, using REST API, or through configuration files. See Configuring Storage Plugins for more information.
When you configure storage plugins, you use a set of storage plugin attributes, such as the storage plugin type, formats that the plugin type supports, and connection parameters.
The following sections describe the attributes that you can use in your storage plugin configurations and provide information related to the use of attributes.
Storage Plugin Attributes
The following graphic shows key attributes of a typical dfs
-based storage plugin configuration:
List of Attributes and Definitions
The following table describes the attributes you configure for storage plugins installed with Drill.
Attribute | Example Values | Required | Description |
---|---|---|---|
"type" | "file" "hbase" "hive" "mongo" |
yes | A valid storage plugin type name. |
"enabled" | true false |
yes | State of the storage plugin. |
"connection" | "classpath:///" "file:///" "mongodb://localhost:27017/" "hdfs://" |
implementation-dependent | The type of distributed file system, such as HDFS, Amazon S3, or files in your file system, and an address/path name. |
"workspaces" | null "logs" |
no | One or more unique workspace names. If a workspace name is used more than once, only the last definition is effective. |
"workspaces". . . "location" | "location": "/Users/johndoe/mydata" "location": "/tmp" |
no | Full path to a directory on the file system. |
"workspaces". . . "writable" | true false |
no | Allows or disallows writing in the workspaces |
"workspaces". . . "defaultInputFormat" | null "parquet" "csv" "json" |
no | Format for reading data, regardless of extension. Default = "parquet" |
"workspaces". . . "allowAccessOutsideWorkspace" | false true |
yes | Introduced in Drill 1.12. Prevents users from accessing paths outside the root of a workspace. Set to false by default to disallow access outside the root of a workspace. To allow access to paths outside the root of a workspace, change the value to true. Dfs storage plugins configured prior to Drill 1.12 (that do not have the parameter specified) cannot access paths outside of the workspace unless this parameter is included in the workspace configuration and set to true. |
"formats" | "pcap" "pcapng" "psv" "csv" "tsv" "parquet" "json" "avro" "maprdb" "image" "sequencefile" "httpd" |
yes | One or more valid file formats for reading. Drill detects formats of some files; others require configuration. The maprdb format is in installations of the mapr-drill package. |
"formats" . . . "type" | "pcap" "pcapng" "text" "parquet" "json" "maprdb" "avro" "image" "sequencefile" "[ltsv](https://drill.apache.org/docs/ltsv-format-plugin/)" "httpd" "[syslog](/docs/syslog-format-plugin/)" |
yes | Format type. You can define two formats, csv and psv, as type "Text", but having different delimiters. |
formats . . . "extensions" | ["csv"] ["log"] |
format-dependent | File name extensions that Drill can read. |
"formats" . . . "delimiter" | "\n" "\r" "\t" "\r\n" "," |
format-dependent | Sequence of one or more characters that signifies the end of a line of text and the start of a new line in a delimited text file, such as CSV. Drill treats \n as the standard line delimiter. As of Drill 1.8, Drill supports multi-byte delimiters, such as \r\n. Use a 4-digit hex code syntax \uXXXX for a non-printable delimiter. |
"formats" . . . "quote" | """ | no | A single character that starts/ends a value in a delimited text file. |
"formats" . . . "escape" | "`" | no | A single character that escapes a quotation mark inside a value. |
"formats" . . . "comment" | "#" | no | The line decoration that starts a comment line in the delimited text file. |
"formats" . . . "skipFirstLine" | true | no | To include or omit the header when reading a delimited text file. Set to true to avoid reading headers as data. |
"formats" . . . "extractHeader" | true | no | Set to true to extract and use headers as column names when reading a delimited text file, false otherwise. Ensure skipFirstLine is not true when extractHeader=false. |
Using the Formats Attributes
You set the formats attributes, such as skipFirstLine, in the formats
area of the storage plugin configuration. When setting attributes for text files, such as CSV, you also need to set the sys.options
property exec.storage.enable_new_text_reader
to true (the default). For more information and examples of using formats for text files, see “Text Files: CSV, TSV, PSV”.
Table Function Parameters
Using the Formats Attributes as Table Function Parameters
In Drill version 1.4 and later, you can also set the formats attributes defined above on a per query basis. To pass parameters to the format plugin, use the table function syntax:
select a, b from table({table function name}(parameters))
The table function name
is the table name, the type parameter is the format name, and the other parameters are the fields that the format plugin configuration accepts, as defined in the table above (except for extensions
which do not apply in this context).
For example, to read a CSV file and parse the header:
select a, b from table(dfs.`path/to/data.csv`(type => 'text',
fieldDelimiter => ',', extractHeader => true))
For more information about format plugin configuration see “Text Files: CSV, TSV, PSV”.
Specifying the Schema as Table Function Parameter
Table schemas normally reside in the root folder of each table. You can also specify a schema for an individual query
using a table function and specifying the SCHEMA
property. You can combine the schema with format plugin properties.
The syntax is similar to the CREATE OR REPLACE SCHEMA:
SELECT a, b FROM TABLE (table_name(
SCHEMA => 'inline=(column_name data_type [nullability] [format] [default] [properties {prop='val', ...})]'))
You can specify the schema inline within the query. For example:
select * from table(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}'))
Alternatively, you can also specify the path to a schema file. For example:
select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))
The following example demonstrates applying provided schema alongside with format plugin table function parameters.
Suppose that you have a CSV file with headers and with a custom extension: csvh-test
. You can combine the schema with format plugin properties:
select * from table(dfs.tmp.`cars.csvh-test`(type => 'text',
fieldDelimiter => ',', extractHeader => true,
schema => 'inline=(col1 date)'))
Using Other Attributes
The configuration of other attributes, such as size.calculator.enabled
in the hbase
plugin and configProps
in the hive
plugin, are implementation-dependent and beyond the scope of this document.
Case-Sensitivity
Starting in Drill 1.15, storage plugin names and workspaces (schemas) are case-insensitive. For example, the following query uses a storage plugin named dfs
and a workspace named clicks
. You can reference dfs.clicks
in an SQL statement in uppercase or lowercase, as shown:
USE dfs.clicks;
USE DFS.CLICKs;
USE dfs.CLICKS;
Refer to Case-Sensitivity for more information about case-sensitivity in Drill.