Configuring Index Planning
Certain options control how Drill uses indexes when planning and executing queries. You can set the query planning and execution options, at the system or session level, using the ALTER SYSTEM | SET commands, as shown: |
ALTER SYSTEM SET `planner.enable_index_planning` = 'true'
SET `planner.enable_index_planning` = 'false'
Options set at the session level only apply to queries that you run during the current Drill connection. Options set at the system level affect the entire system and persist between restarts. Session level settings override system level settings. Typically, you set the options at the session level unless you want the setting to persist across all sessions.
The following table lists the index planning and execution options that you can enable, disable, or modify:
Note: The planning option names are prepended with planner, for example planner.enable_index_planning
. The execution options are prepended with exec, for example exec.query.rowkeyjoin_batchsize
.
Option | Description | Default Value | Possible Values | |
---|---|---|---|---|
planner.enable_index_planning | Enables or disables index planning | TRUE | true | false |
planner.index.force_sort_noncovering | Forces Drill to sort for non-covering indexes. If the query has an ORDER-BY on index columns and a non-covering index is chosen, by default Drill leverages the sortedness of the index columns and does not sort. Fast changing primary table data may produce a partial sort. This option forces a sort within Drill. | FALSE | true | false |
planner.enable_rowkeyjoin_conversion | Drill can push down the rowkey filter to the data source during runtime. For a query to qualify for runtime filter pushdown, the join condition must filter on a rowkey. For example, SELECT t.mscIdentities FROM dfs.root./user/ted/MixTable t WHERE t.row_key IN (SELECT max(convert_fromutf8(i.KeyA.ENTRY_KEY)) FROM dfs.root./user/ted/TableIMSI i WHERE i.row_key=’460021050005636’) Drill evaluates the results of the subquery at runtime. The subquery yields a list of rowkeys from the TableIMSI table. Drill pushes down the list of rowkeys to the data source. The data source uses the rowkeys to locate the corresponding documents in the MixTable table and sends the results to Drill. Note: Currently, Drill does not support runtime filters for queries with equality conditions. The query planner in Drill converts an equality condition to a left join. As a workaround, use the IN operator instead of the equality (=) operator for queries in which you want Drill to push down the rowkey filter to the data source. Drill does not perform runtime filter pushdown for queries that filter on rowkeys in small fact tables when the rowcount is generated from the right side of the join. |
|||
planner.rowkeyjoin_conversion_selectivity_threshold | Sets the selectivity (as a percentage) under which Drill uses a rowkey join for eligible queries. | 0.01 | Range: 0.0-1.0 | |
planner.rowkeyjoin_conversion_using_hashjoin | When enabled, Drill uses the hash join operator instead of a rowkey join. | FALSE | true | false |
planner.index.covering_selectivity_threshold | For covering indexes, this option specifies the filter selectivity that corresponds to the leading prefix of the index below which the index is considered for planning. For example, for the filter ‘a > 10 AND b < 20’ if an index has indexed columns (a, b, c) and the combined selectivity of the above condition is less than the threshold, the index is considered for the query plan. | 0.75 | 0 - 1.0 | |
planner.index.noncovering_selectivity_threshold | For non-covering indexes, this option specifies the filter selectivity that corresponds to the leading prefix of the index below which the index is considered for planning. | 0.025 | 0 - 1.0 | |
planner.index.max_chosen_indexes_per_table | The maximum number of “chosen” indexes for a table after index costing and ranking. | 5 | 0 - 100 | |
planner.index.rowkeyjoin_cost_factor | The cost factor that provides some control over the I/O cost for non-covering indexes when the rowkey join back to the primary table causes random I/O from the primary table. | 0.1 | 0 - max_double | |
planner.enable_statistics | Enable or disable statistics for the filter conditions on indexed columns. | TRUE | true | false |
exec.query.rowkeyjoin_batchsize | For batch GET operations, this option specifies the batch size in terms of the number of rowkeys. Used for non-covering index plans when doing joins back to primary table. | 128 | 0 - Long.MAX_VALUE |