Querying a File System Introduction
Files and directories are like standard SQL tables to Drill. You can specify a
file system “database” as a prefix in queries when you refer to objects across
databases. In Drill, a file system database consists of a storage plugin name
followed by an optional workspace name, for example
The following example shows a query on a file system database in a Hadoop distributed file system.
SELECT * FROM hdfs.logs.`AppServerLogs/20104/Jan/01/part0001.txt`;
The path component in backticks is parsed by Drill as a Java string which
means that the backslash \
is the escape character that can be used to
include special characters in the path. The next example shows another query
against a filesystem storage plugin, but now one which is configured to
access the local filesystem on an MS Windows machine (which may, of course,
included mounted network drives). Note the doubled backslashes needed due
to its status as the escape character.
SELECT * FROM dfs.`C:\\Users\\Alice\\Downloads\\sample-dataset.csv`;
The default dfs
storage plugin configuration registered with Drill has a
default
workspace. If you query data in the default
workspace, you do not
need to include the workspace in the query. Refer to
Workspaces for
more information.
Drill supports the following file types:
- Plain text files, including:
- Comma-separated values (CSV, type: text)
- Tab-separated values (TSV, type: text)
- Pipe-separated values (PSV, type: text)
- Structured data files:
The extensions for these file types must match the configuration settings for
your registered storage plugins. For example, PSV files may be defined with a
.tbl
extension, while CSV files are defined with a .csv
extension.
Implicit Columns
Drill 1.8 introduces implicit columns. Implicit columns provide file information, such as the directory path to a file and the file extension. You can query implicit columns in files, directories, nested directories, and files.
The following table lists the implicit columns available and their descriptions:
Implicit Column Name | Description |
---|---|
FQN | The fully qualified name. Contains the full path to the file, including the file name. |
FILEPATH | The full path to the file, without the file name. |
FILENAME | The file name with the file extension. Does not include the path to the file. |
SUFFIX | The file suffix without the dot (.) at the beginning. |
LMT | The last modification time of the file as recorded by the containing file system |
To access implicit columns, you must explicitly include the columns in a query, as shown in the following example:
0: jdbc:drill:zk=local> SELECT fqn, filepath, filename, suffix FROM dfs.`/dev/data/files/test.csvh` LIMIT 1;
|-------------------------------------|--------------------------|---------------|----------------|
| fqn | filepath | filename | suffix |
|-------------------------------------|--------------------------|---------------|----------------|
| /dev/data/files/test.csvh | /dev/data/files | test.csvh | csvh |
|-------------------------------------|--------------------------|---------------|----------------|
Note
If a table has a column with the same name as an implicit column, such as “suffix,” the implicit column overrides the table column.
If a column name has the same name as an implicit column, you can change the default implicit column name using the [ALTER SYSTEM | SESSION SET](/docs/alter-system/) command with the appropriate parameter, as shown in the following example: |
ALTER SYSTEM SET `drill.exec.storage.implicit.suffix.column.label` = appendix;
Use the following configuration options to change the default implicit column names:
drill.exec.storage.implicit.fqn.column.label
drill.exec.storage.implicit.filepath.column.label
drill.exec.storage.implicit.filename.column.label
drill.exec.storage.implicit.suffix.column.label