Plugin Configuration Basics

There are several ways you can configure storage plugins. For example, you can configure storage plugins in the Drill Web UI, using REST API, or through configuration files. See Configuring Storage Plugins for more information.

When you configure storage plugins, you use a set of storage plugin attributes, such as the storage plugin type, formats that the plugin type supports, and connection parameters.

The following sections describe the attributes that you can use in your storage plugin configurations and provide information related to the use of attributes.

Storage Plugin Attributes

The following graphic shows key attributes of a typical dfs-based storage plugin configuration: dfs plugin

List of Attributes and Definitions

The following table describes the attributes you configure for storage plugins installed with Drill.

Attribute Example Values Required Description
"type" "file"
"hbase"
"hive"
"mongo"
yes A valid storage plugin type name.
"enabled" true
false
yes State of the storage plugin.
"connection" "classpath:///"
"file:///"
"mongodb://localhost:27017/"
"hdfs://"
implementation-dependent The type of distributed file system, such as HDFS, Amazon S3, or files in your file system, and an address/path name.
"workspaces" null
"logs"
no One or more unique workspace names. If a workspace name is used more than once, only the last definition is effective.
"workspaces". . . "location" "location": "/Users/johndoe/mydata"
"location": "/tmp"
no Full path to a directory on the file system.
"workspaces". . . "writable" true
false
no Allows or disallows writing in the workspaces
"workspaces". . . "defaultInputFormat" null
"parquet"
"csv"
"json"
no Format for reading data, regardless of extension. Default = "parquet"
"workspaces". . . "allowAccessOutsideWorkspace" false
true
yes Introduced in Drill 1.12. Prevents users from accessing paths outside the root of a workspace. Set to false by default to disallow access outside the root of a workspace. To allow access to paths outside the root of a workspace, change the value to true. Dfs storage plugins configured prior to Drill 1.12 (that do not have the parameter specified) cannot access paths outside of the workspace unless this parameter is included in the workspace configuration and set to true.
"formats" "pcap"
"pcapng"
"psv"
"csv"
"tsv"
"parquet"
"json"
"avro"
"maprdb"
"image"
"sequencefile"
"httpd"
yes One or more valid file formats for reading. Drill detects formats of some files; others require configuration. The maprdb format is in installations of the mapr-drill package.
"formats" . . . "type" "pcap"
"pcapng"
"text"
"parquet"
"json"
"maprdb"
"avro"
"image"
"sequencefile"
"[ltsv](https://drill.apache.org/docs/ltsv-format-plugin/)"
"httpd"
"[syslog](/docs/syslog-format-plugin/)"
yes Format type. You can define two formats, csv and psv, as type "Text", but having different delimiters.
formats . . . "extensions" ["csv"]
["log"]
format-dependent File name extensions that Drill can read.
"formats" . . . "delimiter" "\n"
"\r"
"\t"
"\r\n"
","
format-dependent Sequence of one or more characters that signifies the end of a line of text and the start of a new line in a delimited text file, such as CSV. Drill treats \n as the standard line delimiter. As of Drill 1.8, Drill supports multi-byte delimiters, such as \r\n. Use a 4-digit hex code syntax \uXXXX for a non-printable delimiter.
"formats" . . . "quote" """ no A single character that starts/ends a value in a delimited text file.
"formats" . . . "escape" "`" no A single character that escapes a quotation mark inside a value.
"formats" . . . "comment" "#" no The line decoration that starts a comment line in the delimited text file.
"formats" . . . "skipFirstLine" true no To include or omit the header when reading a delimited text file. Set to true to avoid reading headers as data.
"formats" . . . "extractHeader" true no Set to true to extract and use headers as column names when reading a delimited text file, false otherwise. Ensure skipFirstLine is not true when extractHeader=false.

Using the Formats Attributes

You set the formats attributes, such as skipFirstLine, in the formats area of the storage plugin configuration. When setting attributes for text files, such as CSV, you also need to set the sys.options property exec.storage.enable_new_text_reader to true (the default). For more information and examples of using formats for text files, see “Text Files: CSV, TSV, PSV”.

Table Function Parameters

Using the Formats Attributes as Table Function Parameters

In Drill version 1.4 and later, you can also set the formats attributes defined above on a per query basis. To pass parameters to the format plugin, use the table function syntax:

select a, b from table({table function name}(parameters))

The table function name is the table name, the type parameter is the format name, and the other parameters are the fields that the format plugin configuration accepts, as defined in the table above (except for extensions which do not apply in this context).

For example, to read a CSV file and parse the header: select a, b from table(dfs.`path/to/data.csv`(type => 'text', fieldDelimiter => ',', extractHeader => true))

For more information about format plugin configuration see “Text Files: CSV, TSV, PSV”.

Specifying the Schema as Table Function Parameter

Table schemas normally reside in the root folder of each table. You can also specify a schema for an individual query using a table function and specifying the SCHEMA property. You can combine the schema with format plugin properties. The syntax is similar to the CREATE OR REPLACE SCHEMA:

SELECT a, b FROM TABLE (table_name(
SCHEMA => 'inline=(column_name data_type [nullability] [format] [default] [properties {prop='val', ...})]'))

You can specify the schema inline within the query. For example:

select * from table(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}'))

Alternatively, you can also specify the path to a schema file. For example:

select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))

The following example demonstrates applying provided schema alongside with format plugin table function parameters. Suppose that you have a CSV file with headers and with a custom extension: csvh-test. You can combine the schema with format plugin properties:

select * from table(dfs.tmp.`cars.csvh-test`(type => 'text',
fieldDelimiter => ',', extractHeader => true,
schema => 'inline=(col1 date)'))

Using Other Attributes

The configuration of other attributes, such as size.calculator.enabled in the hbase plugin and configProps in the hive plugin, are implementation-dependent and beyond the scope of this document.

Case-Sensitivity

Starting in Drill 1.15, storage plugin names and workspaces (schemas) are case-insensitive. For example, the following query uses a storage plugin named dfs and a workspace named clicks. You can reference dfs.clicks in an SQL statement in uppercase or lowercase, as shown:

   USE dfs.clicks;
   USE DFS.CLICKs;
   USE dfs.CLICKS;

Refer to Case-Sensitivity for more information about case-sensitivity in Drill.