Monitoring Metrics
The Metrics page in the Drill Web UI (http(s)://<drillbit-ip-address>:8047/metrics
) lists JVM, operating system, and certain Drill-specific metrics. You can use these metrics to debug the state of the cluster. The Drill-specific metrics are prepended with drill
, for example drill.fragments.running
.
Drill uses JMX (Java Management Extensions) to expose metrics at runtime. JMX provides the architecture to dynamically manage and monitor applications. JMX collects Drill system-level metrics that you can see in the Metrics tab in the Drill Web UI or a remote JMX monitoring tool, such as JConsole or the VisualVM + MBeans plugin.
Metrics collected by JMX are divided into the following categories on the Metrics page in the Drill Web UI:
- Gauges A gauge is an instantaneous measure of a value. See Gauges.
- Counters A counter is a snapshot of the count of metrics at a particular point in time. (A gauge for an AtomicLong instance.) See Counters.
- Histograms A histogram measures the statistical distribution of values in a stream of data. See Histograms.
- Meters A meter measures the rate of events over time, for example requests per second. Drill currently does not use meters to report system-level metrics.
- Timers A timer measures the rate that a particular piece of code is called and the distribution of its duration. See Timers.
Remote Monitoring
You can enable the remote JMX Java feature to monitor a specific JVM from a remote location. You can enable remote JMX with or without authentication. See the Java documentation.
In $DRILL_HOME/conf/drill-env.sh
, use the DRILLBIT_JAVA_OPTS
variable to pass the relevant parameters. For example, to add remote monitoring on port 8048 without authentication:
export DRILLBIT_JAVA_OPTS=”$DRILLBIT_JAVA_OPTS -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8048”
Disabling Drill Metrics
JMX metric collection is enabled, by default. You can disable the metrics option if needed.
In $DRILL_HOME/conf/drill-env.sh
, set the drill.metrics.jmx.enabled
option to false through the DRILLBIT_JAVA_OPTS
variable. Add the variable in drill-env.sh
if it does not exist:
export DRILLBIT_JAVA_OPTS="$DRILLBIT_JAVA_OPTS -Ddrill.metrics.jmx.enabled=false"
Metrics with Prefix
Starting with Drill 2.0.0, all metrics are grouped into the following 6 categories.
- drill
- jvm
- memory
- threads
- cached-threads
- class
These categories are prepended to the Drill metric names for easier maintenance.
For example, the metric names
- count
- direct.count
- blocked.count
- new.count
now have the following, more easily recognised names.
- cached-threads.count
- jvm.direct.count
- threads.blocked.count
- threads.new.count
Gauges
The following table lists the Drill-specific metrics in the Gauges section of the Metrics page:
Metric | Description |
---|---|
cached-threads.blocked.count | The number of threads that are blocked because they are waiting on a monitor lock. This metric is useful for debugging Drill issues. |
drill.fragments.running | The number of query fragments currently running in the drillbit. |
drill.allocator.root.used | The amount of memory (in bytes) used by the internal memory allocator. |
drill.allocator.root.peak | The peak amount of memory (in bytes) used by the internal memory allocator. |
drill.allocator.rpc.bit.control.peak | The maximum amount of bytes used across all outgoing and incoming control connections for this Drillbit at the current time. |
drill.allocator.rpc.bit.control.used | The total number of bytes currently used across all outgoing and incoming control connections for this Drillbit. |
drill.allocator.rpc.bit.data.peak | The maximum amount of memory used between all outgoing and incoming data connections for this Drillbit at the current time. |
drill.allocator.rpc.bit.data.used | The total amount of memory used between all outgoing and incoming data connections tor this Drillbit. |
drill.allocator.rpc.bit.user.peak | The maximum amount of memory used across all incoming Drill client connections to this Drillbit at the current time. |
drill.allocator.rpc.bit.user.used | The total amount of memory used across all incoming Drill client connections to this Drillbit. |
drill.allocator.huge.size | Total size in bytes of huge (greater than 16MB) direct buffers currently allocated. |
drill.allocator.huge.count | Number of allocations done for direct buffer of size greater than 16MB. Each of these allocation happens from OS which comes with an overhead rather from Netty’s buffer pool. |
drill.allocator.normal.count | Number of allocations done for direct buffer of size less than equal to 16MB. Each of these allocation happens from Netty’s buffer pool. This counter is only updated in debug environment when asserts are enabled to avoid overhead for each allocation during normal execution. |
drill.allocator.normal.size | Total size in bytes of normal (less than and equal to 16MB) direct buffers currently allocated. This counter is only updated in debug environment when asserts are enabled to avoid overhead for each allocation during normal execution. |
cached-threads.count | The number of live threads, including daemon and non-daemon threads. |
memory.heap.used | The amount of heap memory (in bytes) used by the JVM. |
memory.non-heap.used | The amount of non-heap memory (in bytes) used by the JVM. |
fd.usage | The ratio of used file descriptors to total file descriptors on *nix systems. |
jvm.direct.used | The amount of direct memory (in bytes) used by the JVM. This metric is useful for debugging Drill issues. |
cached-threads.runnable.count | The number of threads executing an action in the JVM. This metric is useful for debugging Drill issues. |
threads.waiting.count | The number of threads waiting to execute. Typically, threads waiting on other threads to perform an action. This metric is useful for debugging Drill issues. |
drillbit.load.avg | Returns the “recent cpu usage” for the Drillbit process. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that none of the CPUs were running threads from the Drillbit process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the Drillbit process 100% of the time during the recent period being observed. Threads from the Drillbit process includes the application threads as well as the JVM internal threads. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the Drillbit process and the whole system. If the recent CPU usage is not available, the method returns a negative value. See getProcesCpuLoad(). |
drillbit.uptime | Total uptime of Drillbit JVM in miliseconds. See getUptime(). |
Counters
The following table lists the Drill-specific metrics in the Counters section of the Metrics page:
Metric | Description |
---|---|
drill.connections.rpc.control.encrypted | The total number of encrypted incoming and outgoing control connections to and from this Drillbit. This includes both the control client and server connections. |
drill.connections.rpc.control.unencrypted | The total number of unencrypted incoming and outgoing control connections to and from this Drillbit. This includes both the control client and server connections. |
drill.connections.rpc.data.encrypted | The total number of encrypted incoming and outgoing data connections to and from this Drillbit. This includes both the data client and data server connections. |
drill.connections.rpc.data.unencrypted | The total number of unencrypted incoming and outgoing data connections to and from this Drillbit. This includes both the data client and data server connections. |
drill.connections.rpc.user.encrypted | The total number of encrypted connections from the all the Drill clients to this Drillbit. |
drill.connections.rpc.user.unencrypted | The total number of unencrypted connections from the all the Drill clients to this Drillbit. |
drill.queries.canceled | The number of canceled queries for which this Drillbit was the Foreman. |
drill.queries.completed | The number of queries completed, canceled, or failed for which this Drillbit was the Foreman. |
drill.queries.enqueued | The number of waiting queries across all of the configured queues for which this Drillbit is the Foreman. |
drill.queries.failed | The number of failed queries for which this Drillbit was the Foreman. |
drill.queries.planning | The number of queries that are in the planning stage for which the Drillbit is the Foreman. |
drill.queries.running | The number of queries running for which this Drillbit is the Foreman. |
drill.queries.succeeded | The number of successful queries for which this Drillbit was the Foreman. |
Histograms
The following table lists the Drill-specific metrics in the Histograms section of the Metrics page:
Reporting Class | Description |
---|---|
drill.allocator.huge.hist | Displays the distribution of allocation of huge buffers up to the current time. Like count, it specifies number of huge buffer allocations completed so far. Max/Min specifies maximum/minimum size in bytes of the huge buffer allocated. Mean and other percentiles show the distribution of the huge buffer allocation size in bytes. |
drill.allocator.normal.hist | Displays the distribution of allocation of the normal size buffers up to the current time. Like count, it specifies the number of normal buffer allocations completed so far. Max/Min specifies maximum/minimum size in bytes of the normal buffer allocated. Mean and other percentiles show the distribution of normal buffer allocation size in bytes. |
Meters
Not available.
Timers
Reporting Class | Description |
---|---|
org.apache.drill.exec.cache.VectorAccessibleSerializable.writerTime | Measures the distribution of the time taken to serialize a record batch to the output stream. Mainly used to measure the time taken to spill a record batch. |
org.apache.drill.exec.store.schedule.BlockMapBuilder.blockMapBuilderTimer | Measures the distribution of the time taken to build a mapping of block locations for a given file byte range. Mainly used during the planning phase to determine a set of endpoints where all the data is located. |