Query Data Introduction

You can query local and distributed file systems, Hive, HBase data, complex data, INFORMATION SCHEMA, and system tables as described in the subtopics of this section.

The query specifies the data source location and includes data casting.

Specifying the Data Source Location

The optional USE statement runs subsequent queries against a particular storage plugin. The USE statement typically saves typing some of the storage plugin information in the FROM statement. If you omit the USE statement, specify a storage plugin, such as dfs, and optionally a workspace, such as default, and a path to the data source using dot notation and back ticks. For example:

dfs.`default`.`/Users/drill-user/apache-drill-1.1.0/log/sqlline_queries.json`;

Casting Data

In some cases, Drill converts schema-less data to correctly-typed data implicitly. In this case, you do not need to cast the data to another type. The file format of the data and the nature of your query determines the requirement for casting or converting. Differences in casting depend on the data source.

For example, you have to cast a string "100" in a JSON file to an integer in order to apply a math function or an aggregate function.

To query HBase data using Drill, convert every column of an HBase table to/from byte arrays from/to an SQL data type as described in the section "Querying HBase". Use CONVERT_TO or CONVERT_FROM functions to perform conversions of HBase data.

Troubleshooting Queries

In addition to testing queries interactively in the Drill shell, and examining error messages, use the EXPLAIN command to analyze errors and troubleshoot queries that do not run. For example, if you run into a casting error, the query plan text may help you isolate the problem.

0: jdbc:drill:zk=local> !set maxwidth 10000
0: jdbc:drill:zk=local> explain plan for select ... ;

Drill shell commands include the !set <set variable> <value> to increase the default text display (number of characters). By default, most of the plan output is hidden.

Query Syntax Tips

Remember the following tips when querying data with Drill:

  • Include a semicolon at the end of SQL statements, except when you issue a Drill shell command.
    Example:!set maxwidth 10000`
  • Use backticks around keywords, special characters, and identifiers that SQL cannot parse, such as the keyword default and a path that contains a forward slash character: Example: SELECT * FROM dfs.`default`.`/Users/drilluser/apache-drill-1.1.0/sample-data/nation.parquet`;
  • When selecting all (SELECT *) schema-less data, the order of returned columns might differ from the stored order and might vary from query to query.