Class ClusterControllerImpl

java.lang.Object
org.apache.drill.yarn.appMaster.ClusterControllerImpl
All Implemented Interfaces:
ClusterController, RegistryHandler

public class ClusterControllerImpl extends Object implements ClusterController
Controls the Drill cluster by representing the current cluster state with a desired state, taking corrective action to keep the cluster in the desired state. The cluster as a whole has a state, as do each task (node) within the cluster.

This class is designed to allow unit tests. In general, testing the controller on a live cluster is tedious. This class encapsulates the controller algorithm so it can be driven by a simulated cluster.

This object is shared between threads, thus synchronized.

  • Field Details

    • maxRetries

      protected int maxRetries
      Maximum number of retries for each task launch.
  • Constructor Details

    • ClusterControllerImpl

      public ClusterControllerImpl(AMYarnFacade yarn)
  • Method Details

    • enableFailureCheck

      public void enableFailureCheck(boolean flag)
      Specified by:
      enableFailureCheck in interface ClusterController
    • registerScheduler

      public void registerScheduler(Scheduler scheduler)
      Define a task type. Registration order is important: the controller starts task in the order that they are registered. Must happen before the YARN callbacks start.
      Specified by:
      registerScheduler in interface ClusterController
      Parameters:
      scheduler -
    • started

      public void started() throws YarnFacadeException, AMException
      Called when the caller has completed start-up and the controller should become live.
      Specified by:
      started in interface ClusterController
      Throws:
      YarnFacadeException
      AMException
    • tick

      public void tick(long curTime)
      Description copied from interface: ClusterController
      Called by the timer ("pulse") thread to trigger time-based events.
      Specified by:
      tick in interface ClusterController
    • getFreeNodeCount

      public int getFreeNodeCount()
      Get the approximate number of free YARN nodes (those that can accept a task request.) Starts with the number of nodes from the node inventory, then subtracts any in-flight requests (which do not, by definition, have node allocated.)

      This approximation does not consider whether the node has sufficient resources to run a task; only whether the node itself exists.

      Specified by:
      getFreeNodeCount in interface ClusterController
      Returns:
      The approximate number of free YARN nodes.
    • updateRMStatus

      public void updateRMStatus()
      Get an update from YARN on available resources.
      Specified by:
      updateRMStatus in interface ClusterController
    • containersAllocated

      public void containersAllocated(List<org.apache.hadoop.yarn.api.records.Container> containers)
      Description copied from interface: ClusterController
      The RM has allocated one or more containers in response to container requests submitted to the RM.
      Specified by:
      containersAllocated in interface ClusterController
      Parameters:
      containers - the set of containers provided by YARN
    • containerStarted

      public void containerStarted(org.apache.hadoop.yarn.api.records.ContainerId containerId)
      Description copied from interface: ClusterController
      The NM reports that a container has successfully started.
      Specified by:
      containerStarted in interface ClusterController
      Parameters:
      containerId - the container which started
    • taskStartFailed

      public void taskStartFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t)
      Description copied from interface: ClusterController
      The RM API reports that an attempt to start a container has failed locally.
      Specified by:
      taskStartFailed in interface ClusterController
      Parameters:
      containerId - the container that failed to launch
      t - the error that occurred
    • containerStopped

      public void containerStopped(org.apache.hadoop.yarn.api.records.ContainerId containerId)
      Description copied from interface: ClusterController
      The Node Manager reports that a container has stopped.
      Specified by:
      containerStopped in interface ClusterController
    • containersCompleted

      public void containersCompleted(List<org.apache.hadoop.yarn.api.records.ContainerStatus> statuses)
      Description copied from interface: ClusterController
      The Resource Manager reports that containers have completed with the given statuses. Find the task for each container and mark them as completed.
      Specified by:
      containersCompleted in interface ClusterController
    • getProgress

      public float getProgress()
      Specified by:
      getProgress in interface ClusterController
    • stopTaskFailed

      public void stopTaskFailed(org.apache.hadoop.yarn.api.records.ContainerId containerId, Throwable t)
      Description copied from interface: ClusterController
      The Node Manager API reports that a request sent to the NM to stop a task has failed.
      Specified by:
      stopTaskFailed in interface ClusterController
      Parameters:
      containerId - the container that failed to stop
      t - the reason that the stop request failed
    • resizeDelta

      public void resizeDelta(int delta)
      Description copied from interface: ClusterController
      Request to resize the Drill cluster by a relative amount.
      Specified by:
      resizeDelta in interface ClusterController
      Parameters:
      delta - the amount of change. Can be positive (to grow) or negative (to shrink the cluster)
    • resizeTo

      public int resizeTo(int n)
      Description copied from interface: ClusterController
      Request to resize the Drill cluster to the given size.
      Specified by:
      resizeTo in interface ClusterController
      Parameters:
      n - the desired cluster size
    • shutDown

      public void shutDown()
      Description copied from interface: ClusterController
      Indicates a request to gracefully shut down the cluster.
      Specified by:
      shutDown in interface ClusterController
    • waitForCompletion

      public boolean waitForCompletion()
      Description copied from interface: ClusterController
      Called by the main thread to wait for the normal shutdown of the controller. Such shutdown occurs when the admin sends a sutdown command from the UI or REST API.
      Specified by:
      waitForCompletion in interface ClusterController
    • isLive

      public boolean isLive()
    • succeeded

      public boolean succeeded()
    • containerAllocated

      public void containerAllocated(Task task)
    • getYarn

      public AMYarnFacade getYarn()
    • containerReleased

      public void containerReleased(Task task)
    • taskEnded

      public void taskEnded(Task task)
    • taskRetried

      public void taskRetried(Task task)
    • taskGroupCompleted

      public void taskGroupCompleted(SchedulerStateActions taskGroup)
    • getMaxRetries

      public int getMaxRetries()
    • getStopTimeoutMs

      public int getStopTimeoutMs()
    • reserveHost

      public void reserveHost(String hostName)
      Specified by:
      reserveHost in interface RegistryHandler
    • releaseHost

      public void releaseHost(String hostName)
      Specified by:
      releaseHost in interface RegistryHandler
    • getNodeInventory

      public NodeInventory getNodeInventory()
    • setProperty

      public void setProperty(String key, Object value)
      Specified by:
      setProperty in interface ClusterController
    • getProperty

      public Object getProperty(String key)
      Specified by:
      getProperty in interface ClusterController
    • registerLifecycleListener

      public void registerLifecycleListener(TaskLifecycleListener listener)
      Specified by:
      registerLifecycleListener in interface ClusterController
    • fireLifecycleChange

      public void fireLifecycleChange(TaskLifecycleListener.Event event, EventContext context)
    • setMaxRetries

      public void setMaxRetries(int value)
      Specified by:
      setMaxRetries in interface ClusterController
    • getTargetCount

      public int getTargetCount()
      Description copied from interface: ClusterController
      Return the target number of tasks that the controller seeks to maintain. This is the sum across all pools.
      Specified by:
      getTargetCount in interface ClusterController
    • getState

      public ClusterControllerImpl.State getState()
    • visit

      public void visit(ControllerVisitor visitor)
      Description copied from interface: ClusterController
      Allow an observer to see a consistent view of the controller's state by performing the visit in a synchronized block.
      Specified by:
      visit in interface ClusterController
    • getPools

      public List<SchedulerStateActions> getPools()
    • visitTasks

      public void visitTasks(TaskVisitor visitor)
      Description copied from interface: ClusterController
      Allow an observer to see a consistent view of the controller's task state by performing the visit in a synchronized block.
      Specified by:
      visitTasks in interface ClusterController
    • getHistory

      public List<Task> getHistory()
    • isTaskLive

      public boolean isTaskLive(int id)
      Specified by:
      isTaskLive in interface ClusterController
    • cancelTask

      public boolean cancelTask(int id)
      Description copied from interface: ClusterController
      Cancels the given task, reducing the target task count. Called from the UI to allow the user to select the specific task to end when reducing cluster size.
      Specified by:
      cancelTask in interface ClusterController
    • completionAck

      public void completionAck(Task task, String propertyKey)
      Specified by:
      completionAck in interface RegistryHandler
    • startAck

      public void startAck(Task task, String propertyKey, Object value)
      Specified by:
      startAck in interface RegistryHandler
    • supportsDiskResource

      public boolean supportsDiskResource()
      Description copied from interface: ClusterController
      Whether this distribution of YARN supports disk resources.
      Specified by:
      supportsDiskResource in interface ClusterController
      Returns:
      True if this distribution of YARN supports disk resources. False otherwise.
    • registryDown

      public void registryDown()
      Specified by:
      registryDown in interface RegistryHandler