Interface BatchSizePredictor

All Known Implementing Classes:
BatchSizePredictorImpl

public interface BatchSizePredictor
This class predicts the sizes of batches given an input batch.

Invariants

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Interface
    Description
    static interface 
    A factory for creating BatchSizePredictors.
  • Method Summary

    Modifier and Type
    Method
    Description
    long
    Gets the batchSize computed in the call to updateStats().
    int
    Gets the number of records computed in the call to updateStats().
    boolean
    True if the input batch had records in the last call to updateStats().
    long
    predictBatchSize(int desiredNumRecords, boolean reserveHash)
    Predicts the size of a batch using the current collected stats.
    void
    This method can be called multiple times to collect stats about the latest data in the provided record batch.
  • Method Details

    • getBatchSize

      long getBatchSize()
      Gets the batchSize computed in the call to updateStats(). Returns 0 if hadDataLastTime() is false.
      Returns:
      Gets the batchSize computed in the call to updateStats(). Returns 0 if hadDataLastTime() is false.
      Throws:
      IllegalStateException - if updateStats() was never called.
    • getNumRecords

      int getNumRecords()
      Gets the number of records computed in the call to updateStats(). Returns 0 if hadDataLastTime() is false.
      Returns:
      Gets the number of records computed in the call to updateStats(). Returns 0 if hadDataLastTime() is false.
      Throws:
      IllegalStateException - if updateStats() was never called.
    • hadDataLastTime

      boolean hadDataLastTime()
      True if the input batch had records in the last call to updateStats(). False otherwise.
      Returns:
      True if the input batch had records in the last call to updateStats(). False otherwise.
    • updateStats

      void updateStats()
      This method can be called multiple times to collect stats about the latest data in the provided record batch. These stats are used to predict batch sizes. If the batch currently has no data, this method is a noop. This method must be called at least once before predictBatchSize(int, boolean).
    • predictBatchSize

      long predictBatchSize(int desiredNumRecords, boolean reserveHash)
      Predicts the size of a batch using the current collected stats.
      Parameters:
      desiredNumRecords - The number of records contained in the batch whose size we want to predict.
      reserveHash - Whether or not to include a column containing hash values.
      Returns:
      The size of the predicted batch.
      Throws:
      IllegalStateException - if hadDataLastTime() is false or updateStats() was not called.