Class ZKRegistry
- All Implemented Interfaces:
DrillbitStatusListener
,Pollable
,TaskLifecycleListener
Locking strategy: Receives events from both ZK and the cluster controller, both of which must be synchronized. To prevent deadlocks, this class NEVER calls into the cluster controller while holding a lock. This prevents the following:
ClusterController --> ZKRegistry (OK)
ZK --> ZKRegistry (OK)
ZK --> ZKRegistry --> Cluster Controller (bad)
In the case of registration, ZK calls the registry which must alert the cluster controller. Cluster controller alerting is handled outside the ZK update critical section.
Because ZK events are occur relatively infrequently, any deadlock will occur once in a blue moon, which will make it very hard to reproduce. So, extreme caution is needed at design time to prevent the problem.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
protected static class
State of each Drillbit that we've discovered through ZK or launched via the AM.Nested classes/interfaces inherited from interface org.apache.drill.yarn.appMaster.TaskLifecycleListener
TaskLifecycleListener.Event
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
drillbitRegistered
(Set<CoordinationProtos.DrillbitEndpoint> registeredDrillbits) Callback from ZK to indicate that one or more drillbits have become registered.void
drillbitUnregistered
(Set<CoordinationProtos.DrillbitEndpoint> unregisteredDrillbits) Callback from ZK to indicate that one or more drillbits have become deregistered from ZK.void
finish
(RegistryHandler handler) protected Map<String,
ZKRegistry.DrillbitTracker> Get the current registry for testing.boolean
isRegistered
(Task task) Report whether the given task is still registered in ZK.void
start
(RegistryHandler controller) Called during AM startup to initialize ZK.void
stateChange
(TaskLifecycleListener.Event event, EventContext context) Listen for selected YARN task state changes.void
tick
(long curTime) Periodically check ZK status.
-
Field Details
-
CONTROLLER_PROPERTY
- See Also:
-
UPDATE_PERIOD_MS
public static final int UPDATE_PERIOD_MS- See Also:
-
ENDPOINT_PROPERTY
- See Also:
-
-
Constructor Details
-
ZKRegistry
-
-
Method Details
-
start
Called during AM startup to initialize ZK. Checks if any Drillbits are already running. These are "unmanaged" because the AM could not have started them (since they predate the AM.) -
drillbitRegistered
Callback from ZK to indicate that one or more drillbits have become registered. We handle registrations in a critical section, then alert the cluster controller outside the critical section.- Specified by:
drillbitRegistered
in interfaceDrillbitStatusListener
- Parameters:
registeredDrillbits
- the set of newly registered drillbits. Note: the complete set of currently registered bits could be different.
-
drillbitUnregistered
Callback from ZK to indicate that one or more drillbits have become deregistered from ZK. We handle the deregistrations in a critical section, but updates to the cluster controller outside of a critical section.- Specified by:
drillbitUnregistered
in interfaceDrillbitStatusListener
- Parameters:
unregisteredDrillbits
- the set of newly unregistered drillbits.
-
stateChange
Listen for selected YARN task state changes. Called from within the cluster controller's critical section.- Specified by:
stateChange
in interfaceTaskLifecycleListener
-
isRegistered
Report whether the given task is still registered in ZK. Called while waiting for a deregistration event to catch possible cases where the messages is lost. The message should never be lost, but we've seen cases where tasks hang in this state. This is a potential work-around.- Parameters:
task
-- Returns:
- True if the given task is regestered. False otherwise.
-
tick
public void tick(long curTime) Periodically check ZK status. If the ZK connection has timed out, something is very seriously wrong. Shut the whole Drill cluster down since Drill cannot operate without ZooKeeper.This method should not be synchronized. It checks only the ZK state, not internal state. Further, if we do reconnect to ZK, then a ZK thread may attempt to update this registry, which will acquire a synchronization lock.
-
finish
-
listUnmanagedDrillits
-
getRegistryForTesting
Get the current registry for testing. Why for testing? Because this is unsynchronized. In production code, the map may change out from under you.- Returns:
- The current registry.
-