Class RecordBatchSizerManager

java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager

public final class RecordBatchSizerManager extends Object
This class is tasked with managing all aspects of flat Parquet reader record batch sizing logic. Currently a record batch size is constrained with two parameters: Number of rows and Memory usage.
  • Constructor Details

  • Method Details

    • setup

      public void setup()
      Tunes record batch parameters based on configuration and schema.
    • getSchema

      public ParquetSchema getSchema()
      Returns:
      the schema
    • getBatchStatsContext

      public RecordBatchStats.RecordBatchStatsContext getBatchStatsContext()
      Returns:
      batch statistics context
    • allocate

      public void allocate(Map<String,ValueVector> vectorMap) throws OutOfMemoryException
      Allocates value vectors for the current batch.
      Parameters:
      vectorMap - a collection of value vectors keyed by their field names
      Throws:
      OutOfMemoryException
    • getFieldOverflowMap

      Returns:
      the field overflow state map
    • getFieldOverflowContainer

      public RecordBatchSizerManager.FieldOverflowStateContainer getFieldOverflowContainer(String field)
      Parameters:
      field - materialized field
      Returns:
      field overflow state container
    • releaseFieldOverflowContainer

      public boolean releaseFieldOverflowContainer(String field)
      Releases the overflow data resources associated with this field; also removes the overflow container from the overflow containers map.
      Parameters:
      field - materialized field
      Returns:
      true if this field's overflow container was removed from the overflow containers map
    • getCurrentFieldBatchMemory

      public RecordBatchSizerManager.ColumnMemoryQuota getCurrentFieldBatchMemory(String field)
      Parameters:
      field - materialized field
      Returns:
      field batch memory quota
    • getCurrentRecordsPerBatch

      public int getCurrentRecordsPerBatch()
      Returns:
      current number of records per batch (may change across batches)
    • getCurrentMemorySizePerBatch

      public long getCurrentMemorySizePerBatch()
      Returns:
      current total memory per batch (may change across batches)
    • getConfigRecordsPerBatch

      public int getConfigRecordsPerBatch()
      Returns:
      configured number of records per batch (may be different from the enforced one)
    • getConfigMemorySizePerBatch

      public long getConfigMemorySizePerBatch()
      Returns:
      configured memory size per batch (may be different from the enforced one)
    • onEndOfBatch

      public void onEndOfBatch(int batchNumRecords, List<RecordBatchSizerManager.VarLenColumnBatchStats> batchStats)
      Enables this object to optimize the impact of overflows by computing more accurate VL column precision.
      Parameters:
      batchNumRecords - number of records in this batch
      batchStats - columns statistics
    • close

      public void close()
      Closes all resources managed by this object