Configuration Reference

The Creating a Basic Drill Cluster topic presented the minimum configuration needed to launch Drill under YARN. Additional configuration options are available for specialized cases. Refer to the drill-on-yarn-example.conf for information about the other options.

Application Name

The application name appears when starting or stopping the Drill cluster and in the Drill-on-YARN web UI. Choose a name helpful to you:

   app-name: "My Drill Cluster"

Drill Distribution Archive

The Drill distribution archive is assumed to expand to create a folder that has the same name as the archive itself (minus the .tar.gz suffix). That is, the archive apache-drill-x.y.z.tar.gz is assumed to expand to a directory named apache-drill-x.y.z. Apache Drill archives follow this pattern. In specialized cases, you may have to create your own archive. If you do, it is most convenient if you follow the same pattern. However, if cannot follow the pattern, you can configure Drill-on-YARN to follow a custom pattern using the drill-install.dir-name option:

          clientpath: "/path/to/ your-custom-archive.tar.gz"
          dirname: "your-drill-directory"


/path/to/ your-custom-archive.tar.gz is the location of your archive. your-drill-directory is the name of your Drill directory within the archive.

Customize Web UI Port

If you run multiple Drill clusters per YARN cluster, then YARN may choose to place two Drill AM processes on the same node. To avoid port conflicts, change the HTTP port for one or both of the Drill clusters:

          http: {
           port: 12345

Customize Application Master Settings

The following settings apply to the Application Master. All are prefixed with

Name Description Default
memorymb Memory, in MB, to allocate to the AM. 512
vcores Number of CPUS to allocate to the AM. 1
heap Java heap for the AM. 450M
node-label-expr YARN node label expression to use to select nodes to run the AM. None

Drillbit Customization

The following Drill-on-YARN configuration options control the Drillbit processes. All properties start with drill.yarn.drillbit.

Name Description Default
memory-mb Memory, in MB, to allocate to the Drillbit. 13000
vcores Number of CPUS to allocatet to the AM. 4
disks Number of disk equivalents consumed by Drill (on versions of YARN that support disk resources.) 1
heap Java heap memory. 4G
max-direct-memory Direct (offheap) memory for the Drillbit. 8G
log-gc Enables Java garbage collector logging. FALSE
class-path Additional classpath entries. blank

Note that the Drillbit node expression is set in the labeled pool below.

Cluster Groups

YARN was originally designed for MapReduce jobs that can run on any node, and that often can be combined onto a single node. Compared to the traditional MapReduce jobs, Drill has additional constraints:

  • Only one Drillbit (per Drill cluster) can run per host (to avoid port conflict.)
  • Drillbits work best when launched on the same host as the data that the Drillbit is to scan.

Basic Cluster

A basic cluster launches n Drillbits on distinct nodes anywhere in your YARN cluster. The basic cluster is great for testing and other informal tasks: just configure the desired vcores and memory, along with a number of nodes, then launch Drill. YARN will locate a set of suitable hosts anywhere on the YARN cluster.

Labeled Hosts

Drill-on-YARN can handle node placement directly without the use of labeled queues. You use the “labeled” pool type. Then, set the drillbit-label-expr property to a YARN label expression that matches the nodes on which Drill should run. You will most often care only about Drillbit placement. Finally, indicate the number of Drillbits to run on the selected nodes.

Named Hosts

You can configure Drill-on-YARN to run on a specific set of hosts. However, you must keep the list synchronized with your YARN cluster. If you list a host that is not available to YARN, then Drill cannot start a Drillbit on that host. Also, if the host does not have sufficient resources available, the Drillbit will not run.