Package org.apache.drill.yarn.appMaster
Class ClusterControllerImpl
java.lang.Object
org.apache.drill.yarn.appMaster.ClusterControllerImpl
- All Implemented Interfaces:
ClusterController
,RegistryHandler
Controls the Drill cluster by representing the current cluster state with a
desired state, taking corrective action to keep the cluster in the desired
state. The cluster as a whole has a state, as do each task (node) within the
cluster.
This class is designed to allow unit tests. In general, testing the controller on a live cluster is tedious. This class encapsulates the controller algorithm so it can be driven by a simulated cluster.
This object is shared between threads, thus synchronized.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Controller lifecycle state. -
Field Summary
Modifier and TypeFieldDescriptionprotected int
Maximum number of retries for each task launch. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
cancelTask
(int id) Cancels the given task, reducing the target task count.void
completionAck
(Task task, String propertyKey) void
containerAllocated
(Task task) void
containerReleased
(Task task) void
containersAllocated
(List<org.apache.hadoop.yarn.api.records.Container> containers) The RM has allocated one or more containers in response to container requests submitted to the RM.void
containersCompleted
(List<org.apache.hadoop.yarn.api.records.ContainerStatus> statuses) The Resource Manager reports that containers have completed with the given statuses.void
containerStarted
(org.apache.hadoop.yarn.api.records.ContainerId containerId) The NM reports that a container has successfully started.void
containerStopped
(org.apache.hadoop.yarn.api.records.ContainerId containerId) The Node Manager reports that a container has stopped.void
enableFailureCheck
(boolean flag) void
fireLifecycleChange
(TaskLifecycleListener.Event event, EventContext context) int
Get the approximate number of free YARN nodes (those that can accept a task request.) Starts with the number of nodes from the node inventory, then subtracts any in-flight requests (which do not, by definition, have node allocated.)int
getPools()
float
getProperty
(String key) getState()
int
int
Return the target number of tasks that the controller seeks to maintain.getYarn()
boolean
isLive()
boolean
isTaskLive
(int id) void
void
registerScheduler
(Scheduler scheduler) Define a task type.void
void
releaseHost
(String hostName) void
reserveHost
(String hostName) void
resizeDelta
(int delta) Request to resize the Drill cluster by a relative amount.int
resizeTo
(int n) Request to resize the Drill cluster to the given size.void
setMaxRetries
(int value) void
setProperty
(String key, Object value) void
shutDown()
Indicates a request to gracefully shut down the cluster.void
void
started()
Called when the caller has completed start-up and the controller should become live.void
stopTaskFailed
(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) The Node Manager API reports that a request sent to the NM to stop a task has failed.boolean
boolean
Whether this distribution of YARN supports disk resources.void
void
taskGroupCompleted
(SchedulerStateActions taskGroup) void
taskRetried
(Task task) void
taskStartFailed
(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) The RM API reports that an attempt to start a container has failed locally.void
tick
(long curTime) Called by the timer ("pulse") thread to trigger time-based events.void
Get an update from YARN on available resources.void
visit
(ControllerVisitor visitor) Allow an observer to see a consistent view of the controller's state by performing the visit in a synchronized block.void
visitTasks
(TaskVisitor visitor) Allow an observer to see a consistent view of the controller's task state by performing the visit in a synchronized block.boolean
Called by the main thread to wait for the normal shutdown of the controller.
-
Field Details
-
maxRetries
protected int maxRetriesMaximum number of retries for each task launch.
-
-
Constructor Details
-
ClusterControllerImpl
-
-
Method Details
-
enableFailureCheck
public void enableFailureCheck(boolean flag) - Specified by:
enableFailureCheck
in interfaceClusterController
-
registerScheduler
Define a task type. Registration order is important: the controller starts task in the order that they are registered. Must happen before the YARN callbacks start.- Specified by:
registerScheduler
in interfaceClusterController
- Parameters:
scheduler
-
-
started
Called when the caller has completed start-up and the controller should become live.- Specified by:
started
in interfaceClusterController
- Throws:
YarnFacadeException
AMException
-
tick
public void tick(long curTime) Description copied from interface:ClusterController
Called by the timer ("pulse") thread to trigger time-based events.- Specified by:
tick
in interfaceClusterController
-
getFreeNodeCount
public int getFreeNodeCount()Get the approximate number of free YARN nodes (those that can accept a task request.) Starts with the number of nodes from the node inventory, then subtracts any in-flight requests (which do not, by definition, have node allocated.)This approximation does not consider whether the node has sufficient resources to run a task; only whether the node itself exists.
- Specified by:
getFreeNodeCount
in interfaceClusterController
- Returns:
- The approximate number of free YARN nodes.
-
updateRMStatus
public void updateRMStatus()Get an update from YARN on available resources.- Specified by:
updateRMStatus
in interfaceClusterController
-
containersAllocated
Description copied from interface:ClusterController
The RM has allocated one or more containers in response to container requests submitted to the RM.- Specified by:
containersAllocated
in interfaceClusterController
- Parameters:
containers
- the set of containers provided by YARN
-
containerStarted
public void containerStarted(org.apache.hadoop.yarn.api.records.ContainerId containerId) Description copied from interface:ClusterController
The NM reports that a container has successfully started.- Specified by:
containerStarted
in interfaceClusterController
- Parameters:
containerId
- the container which started
-
taskStartFailed
public void taskStartFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t) Description copied from interface:ClusterController
The RM API reports that an attempt to start a container has failed locally.- Specified by:
taskStartFailed
in interfaceClusterController
- Parameters:
containerId
- the container that failed to launcht
- the error that occurred
-
containerStopped
public void containerStopped(org.apache.hadoop.yarn.api.records.ContainerId containerId) Description copied from interface:ClusterController
The Node Manager reports that a container has stopped.- Specified by:
containerStopped
in interfaceClusterController
-
containersCompleted
Description copied from interface:ClusterController
The Resource Manager reports that containers have completed with the given statuses. Find the task for each container and mark them as completed.- Specified by:
containersCompleted
in interfaceClusterController
-
getProgress
public float getProgress()- Specified by:
getProgress
in interfaceClusterController
-
stopTaskFailed
Description copied from interface:ClusterController
The Node Manager API reports that a request sent to the NM to stop a task has failed.- Specified by:
stopTaskFailed
in interfaceClusterController
- Parameters:
containerId
- the container that failed to stopt
- the reason that the stop request failed
-
resizeDelta
public void resizeDelta(int delta) Description copied from interface:ClusterController
Request to resize the Drill cluster by a relative amount.- Specified by:
resizeDelta
in interfaceClusterController
- Parameters:
delta
- the amount of change. Can be positive (to grow) or negative (to shrink the cluster)
-
resizeTo
public int resizeTo(int n) Description copied from interface:ClusterController
Request to resize the Drill cluster to the given size.- Specified by:
resizeTo
in interfaceClusterController
- Parameters:
n
- the desired cluster size
-
shutDown
public void shutDown()Description copied from interface:ClusterController
Indicates a request to gracefully shut down the cluster.- Specified by:
shutDown
in interfaceClusterController
-
waitForCompletion
public boolean waitForCompletion()Description copied from interface:ClusterController
Called by the main thread to wait for the normal shutdown of the controller. Such shutdown occurs when the admin sends a sutdown command from the UI or REST API.- Specified by:
waitForCompletion
in interfaceClusterController
-
isLive
public boolean isLive() -
succeeded
public boolean succeeded() -
containerAllocated
-
getYarn
-
containerReleased
-
taskEnded
-
taskRetried
-
taskGroupCompleted
-
getMaxRetries
public int getMaxRetries() -
getStopTimeoutMs
public int getStopTimeoutMs() -
reserveHost
- Specified by:
reserveHost
in interfaceRegistryHandler
-
releaseHost
- Specified by:
releaseHost
in interfaceRegistryHandler
-
getNodeInventory
-
setProperty
- Specified by:
setProperty
in interfaceClusterController
-
getProperty
- Specified by:
getProperty
in interfaceClusterController
-
registerLifecycleListener
- Specified by:
registerLifecycleListener
in interfaceClusterController
-
fireLifecycleChange
-
setMaxRetries
public void setMaxRetries(int value) - Specified by:
setMaxRetries
in interfaceClusterController
-
getTargetCount
public int getTargetCount()Description copied from interface:ClusterController
Return the target number of tasks that the controller seeks to maintain. This is the sum across all pools.- Specified by:
getTargetCount
in interfaceClusterController
-
getState
-
visit
Description copied from interface:ClusterController
Allow an observer to see a consistent view of the controller's state by performing the visit in a synchronized block.- Specified by:
visit
in interfaceClusterController
-
getPools
-
visitTasks
Description copied from interface:ClusterController
Allow an observer to see a consistent view of the controller's task state by performing the visit in a synchronized block.- Specified by:
visitTasks
in interfaceClusterController
-
getHistory
-
isTaskLive
public boolean isTaskLive(int id) - Specified by:
isTaskLive
in interfaceClusterController
-
cancelTask
public boolean cancelTask(int id) Description copied from interface:ClusterController
Cancels the given task, reducing the target task count. Called from the UI to allow the user to select the specific task to end when reducing cluster size.- Specified by:
cancelTask
in interfaceClusterController
-
completionAck
- Specified by:
completionAck
in interfaceRegistryHandler
-
startAck
- Specified by:
startAck
in interfaceRegistryHandler
-
supportsDiskResource
public boolean supportsDiskResource()Description copied from interface:ClusterController
Whether this distribution of YARN supports disk resources.- Specified by:
supportsDiskResource
in interfaceClusterController
- Returns:
- True if this distribution of YARN supports disk resources. False otherwise.
-
registryDown
public void registryDown()- Specified by:
registryDown
in interfaceRegistryHandler
-