EXPLAIN

EXPLAIN is a useful tool for examining the steps that a query goes through when it is executed. You can use the EXPLAIN output to gain a deeper understanding of the parallel processing that Drill queries exploit. You can also look at costing information, troubleshoot performance issues, and diagnose routine errors that may occur when you run queries.

Drill provides two variations on the EXPLAIN command, one that returns the physical plan and one that returns the logical plan. A logical plan takes the SQL query (as written by the user and accepted by the parser) and translates it into a logical series of operations that correspond to SQL language constructs (without defining the specific algorithms that will be implemented to run the query). A physical plan translates the logical plan into a specific series of steps that will be used when the query runs. For example, a logical plan may indicate a join step in general and classify it as inner or outer, but the corresponding physical plan will indicate the specific type of join operator that will run, such as a merge join or a hash join. The physical plan is operational and reveals the specific access methods that will be used for the query.

An EXPLAIN command for a query that is run repeatedly under the exact same conditions against the same data will return the same plan. However, if you change a configuration option, for example, or update the tables or files that you are selecting from, you are likely to see plan changes.

EXPLAIN Syntax

The EXPLAIN command supports the following syntax:

explain plan [ including all attributes ] [ with implementation | without implementation ] for <query> ;

where query is any valid SELECT statement supported by Drill.

INCLUDING ALL ATTRIBUTES

This option returns costing information. You can use this option for both physical and logical plans.

WITH IMPLEMENTATION | WITHOUT IMPLEMENTATION

These options return the physical and logical plan information, respectively. The default is physical (WITH IMPLEMENTATION).

EXPLAIN for Physical Plans

The EXPLAIN PLAN FOR command returns the chosen physical execution plan for a query statement without running the query. You can use this command to see what kind of execution operators Drill implements. For example, you can find out what kind of join algorithm is chosen when tables or files are joined. You can also use this command to analyze errors and troubleshoot queries that do not run. For example, if you run into a casting error, the query plan text may help you isolate the problem.

Use the following syntax:

explain plan for <query> ;

The following set command increases the default text display (number of characters). By default, most of the plan output is not displayed.

0: jdbc:drill:zk=local> !set maxwidth 10000

Do not use a semicolon to terminate set commands.

For example, here is the top portion of the explain output for a COUNT(DISTINCT) query on a JSON file:

0: jdbc:drill:zk=local> !set maxwidth 10000
0: jdbc:drill:zk=local> explain plan for select type t, count(distinct id) from dfs.`/home/donuts/donuts.json` where type='donut' group by type;
+------------+------------+
|   text    |   json    |
+------------+------------+
| 00-00 Screen
00-01   Project(t=[$0], EXPR$1=[$1])
00-02       Project(t=[$0], EXPR$1=[$1])
00-03       HashAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-04           HashAgg(group=[{0, 1}])
00-05           SelectionVectorRemover
00-06               Filter(condition=[=($0, 'donut')])
00-07               Scan(groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`type`, `id`], files=[file:/home/donuts/donuts.json]]])...
...

Read the text output from bottom to top to understand the sequence of operators that will execute the query. Note that the physical plan starts with a scan of the JSON file that is being queried. The selected columns are projected and filtered, then the aggregate function is applied.

The EXPLAIN text output is followed by detailed JSON output, which is reusable for submitting the query via Drill APIs.

| {
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "resultMode" : "EXEC"
  },
....

Costing Information

Add the INCLUDING ALL ATTRIBUTES option to the EXPLAIN command to see cost estimates for the query plan. For example:

0: jdbc:drill:zk=local> !set maxwidth 10000
0: jdbc:drill:zk=local> explain plan including all attributes for select * from dfs.`/home/donuts/donuts.json` where type='donut';
+------------+------------+
|   text    |   json    |
+------------+------------+
| 00-00 Screen: rowcount = 1.0, cumulative cost = {5.1 rows, 21.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 889
00-01   Project(*=[$0]): rowcount = 1.0, cumulative cost = {5.0 rows, 21.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 888
00-02       Project(T1¦¦*=[$0]): rowcount = 1.0, cumulative cost = {4.0 rows, 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 887
00-03       SelectionVectorRemover: rowcount = 1.0, cumulative cost = {3.0 rows, 13.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 886
00-04           Filter(condition=[=($1, 'donut')]): rowcount = 1.0, cumulative cost = {2.0 rows, 12.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 885
00-05           Project(T1¦¦*=[$0], type=[$1]): rowcount = 1.0, cumulative cost = {1.0 rows, 8.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 884
00-06               Scan(groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`*`], files=[file:/home/donuts/donuts.json]]]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 883

EXPLAIN for Logical Plans

To return the logical plan for a query (again, without actually running the query), use the EXPLAIN PLAN WITHOUT IMPLEMENTATION syntax:

explain plan without implementation for <query> ;

For example:

0: jdbc:drill:zk=local> explain plan without implementation for select type t, count(distinct id) from dfs.`/home/donuts/donuts.json` where type='donut' group by type;
+------------+------------+
|   text    |   json    |
+------------+------------+
| DrillScreenRel
  DrillProjectRel(t=[$0], EXPR$1=[$1])
    DrillAggregateRel(group=[{0}], EXPR$1=[COUNT($1)])
    DrillAggregateRel(group=[{0, 1}])
        DrillFilterRel(condition=[=($0, 'donut')])
        DrillScanRel(table=[[dfs, /home/donuts/donuts.json]], groupscan=[EasyGroupScan [selectionRoot=/home/donuts/donuts.json, numFiles=1, columns=[`type`, `id`], files=[file:/home/donuts/donuts.json]]]) | {
  | {
  "head" : {
    "version" : 1,
    "generator" : {
    "type" : "org.apache.drill.exec.planner.logical.DrillImplementor",
    "info" : ""
    },
    "type" : "APACHE_DRILL_LOGICAL",
    "options" : null,
    "queue" : 0,
    "resultMode" : "LOGICAL"
  },...