Interface BatchSizePredictor
- All Known Implementing Classes:
BatchSizePredictorImpl
public interface BatchSizePredictor
This class predicts the sizes of batches given an input batch.
Invariants
- The
BatchSizePredictor
assumes that aRecordBatch
is in a state where it can return a valid record count.
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic interface
A factory for creatingBatchSizePredictor
s. -
Method Summary
Modifier and TypeMethodDescriptionlong
Gets the batchSize computed in the call toupdateStats()
.int
Gets the number of records computed in the call toupdateStats()
.boolean
True if the input batch had records in the last call toupdateStats()
.long
predictBatchSize
(int desiredNumRecords, boolean reserveHash) Predicts the size of a batch using the current collected stats.void
This method can be called multiple times to collect stats about the latest data in the provided record batch.
-
Method Details
-
getBatchSize
long getBatchSize()Gets the batchSize computed in the call toupdateStats()
. Returns 0 ifhadDataLastTime()
is false.- Returns:
- Gets the batchSize computed in the call to
updateStats()
. Returns 0 ifhadDataLastTime()
is false. - Throws:
IllegalStateException
- ifupdateStats()
was never called.
-
getNumRecords
int getNumRecords()Gets the number of records computed in the call toupdateStats()
. Returns 0 ifhadDataLastTime()
is false.- Returns:
- Gets the number of records computed in the call to
updateStats()
. Returns 0 ifhadDataLastTime()
is false. - Throws:
IllegalStateException
- ifupdateStats()
was never called.
-
hadDataLastTime
boolean hadDataLastTime()True if the input batch had records in the last call toupdateStats()
. False otherwise.- Returns:
- True if the input batch had records in the last call to
updateStats()
. False otherwise.
-
updateStats
void updateStats()This method can be called multiple times to collect stats about the latest data in the provided record batch. These stats are used to predict batch sizes. If the batch currently has no data, this method is a noop. This method must be called at least once beforepredictBatchSize(int, boolean)
. -
predictBatchSize
long predictBatchSize(int desiredNumRecords, boolean reserveHash) Predicts the size of a batch using the current collected stats.- Parameters:
desiredNumRecords
- The number of records contained in the batch whose size we want to predict.reserveHash
- Whether or not to include a column containing hash values.- Returns:
- The size of the predicted batch.
- Throws:
IllegalStateException
- ifhadDataLastTime()
is false orupdateStats()
was not called.
-