Class RecordBatchSizerManager
java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager
This class is tasked with managing all aspects of flat Parquet reader record batch sizing logic.
Currently a record batch size is constrained with two parameters: Number of rows and Memory usage.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
Field memory quotastatic interface
An abstraction to allow column readers attach custom field overflow statestatic final class
Container object to hold current field overflow statestatic final class
Container object to supply variable columns statistics to the batch sizer -
Constructor Summary
ConstructorDescriptionRecordBatchSizerManager
(OptionManager options, ParquetSchema schema, long totalRecordsToRead, RecordBatchStats.RecordBatchStatsContext batchStatsContext) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoid
allocate
(Map<String, ValueVector> vectorMap) Allocates value vectors for the current batch.void
close()
Closes all resources managed by this objectlong
int
getCurrentFieldBatchMemory
(String field) long
int
getFieldOverflowContainer
(String field) void
onEndOfBatch
(int batchNumRecords, List<RecordBatchSizerManager.VarLenColumnBatchStats> batchStats) Enables this object to optimize the impact of overflows by computing more accurate VL column precision.boolean
Releases the overflow data resources associated with this field; also removes the overflow container from the overflow containers map.void
setup()
Tunes record batch parameters based on configuration and schema.
-
Constructor Details
-
RecordBatchSizerManager
public RecordBatchSizerManager(OptionManager options, ParquetSchema schema, long totalRecordsToRead, RecordBatchStats.RecordBatchStatsContext batchStatsContext) Constructor.- Parameters:
options
- drill optionsschema
- current reader schematotalRecordsToRead
- total number of rows to read
-
-
Method Details
-
setup
public void setup()Tunes record batch parameters based on configuration and schema. -
getSchema
- Returns:
- the schema
-
getBatchStatsContext
- Returns:
- batch statistics context
-
allocate
Allocates value vectors for the current batch.- Parameters:
vectorMap
- a collection of value vectors keyed by their field names- Throws:
OutOfMemoryException
-
getFieldOverflowMap
- Returns:
- the field overflow state map
-
getFieldOverflowContainer
- Parameters:
field
- materialized field- Returns:
- field overflow state container
-
releaseFieldOverflowContainer
Releases the overflow data resources associated with this field; also removes the overflow container from the overflow containers map.- Parameters:
field
- materialized field- Returns:
- true if this field's overflow container was removed from the overflow containers map
-
getCurrentFieldBatchMemory
- Parameters:
field
- materialized field- Returns:
- field batch memory quota
-
getCurrentRecordsPerBatch
public int getCurrentRecordsPerBatch()- Returns:
- current number of records per batch (may change across batches)
-
getCurrentMemorySizePerBatch
public long getCurrentMemorySizePerBatch()- Returns:
- current total memory per batch (may change across batches)
-
getConfigRecordsPerBatch
public int getConfigRecordsPerBatch()- Returns:
- configured number of records per batch (may be different from the enforced one)
-
getConfigMemorySizePerBatch
public long getConfigMemorySizePerBatch()- Returns:
- configured memory size per batch (may be different from the enforced one)
-
onEndOfBatch
public void onEndOfBatch(int batchNumRecords, List<RecordBatchSizerManager.VarLenColumnBatchStats> batchStats) Enables this object to optimize the impact of overflows by computing more accurate VL column precision.- Parameters:
batchNumRecords
- number of records in this batchbatchStats
- columns statistics
-
close
public void close()Closes all resources managed by this object
-