Class RecordBatchSizerManager
java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.RecordBatchSizerManager
This class is tasked with managing all aspects of flat Parquet reader record batch sizing logic.
Currently a record batch size is constrained with two parameters: Number of rows and Memory usage.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classField memory quotastatic interfaceAn abstraction to allow column readers attach custom field overflow statestatic final classContainer object to hold current field overflow statestatic final classContainer object to supply variable columns statistics to the batch sizer -
Constructor Summary
ConstructorsConstructorDescriptionRecordBatchSizerManager(OptionManager options, ParquetSchema schema, long totalRecordsToRead, RecordBatchStats.RecordBatchStatsContext batchStatsContext) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoidallocate(Map<String, ValueVector> vectorMap) Allocates value vectors for the current batch.voidclose()Closes all resources managed by this objectlongintgetCurrentFieldBatchMemory(String field) longintgetFieldOverflowContainer(String field) voidonEndOfBatch(int batchNumRecords, List<RecordBatchSizerManager.VarLenColumnBatchStats> batchStats) Enables this object to optimize the impact of overflows by computing more accurate VL column precision.booleanReleases the overflow data resources associated with this field; also removes the overflow container from the overflow containers map.voidsetup()Tunes record batch parameters based on configuration and schema.
-
Constructor Details
-
RecordBatchSizerManager
public RecordBatchSizerManager(OptionManager options, ParquetSchema schema, long totalRecordsToRead, RecordBatchStats.RecordBatchStatsContext batchStatsContext) Constructor.- Parameters:
options- drill optionsschema- current reader schematotalRecordsToRead- total number of rows to read
-
-
Method Details
-
setup
public void setup()Tunes record batch parameters based on configuration and schema. -
getSchema
- Returns:
- the schema
-
getBatchStatsContext
- Returns:
- batch statistics context
-
allocate
Allocates value vectors for the current batch.- Parameters:
vectorMap- a collection of value vectors keyed by their field names- Throws:
OutOfMemoryException
-
getFieldOverflowMap
- Returns:
- the field overflow state map
-
getFieldOverflowContainer
- Parameters:
field- materialized field- Returns:
- field overflow state container
-
releaseFieldOverflowContainer
Releases the overflow data resources associated with this field; also removes the overflow container from the overflow containers map.- Parameters:
field- materialized field- Returns:
- true if this field's overflow container was removed from the overflow containers map
-
getCurrentFieldBatchMemory
- Parameters:
field- materialized field- Returns:
- field batch memory quota
-
getCurrentRecordsPerBatch
public int getCurrentRecordsPerBatch()- Returns:
- current number of records per batch (may change across batches)
-
getCurrentMemorySizePerBatch
public long getCurrentMemorySizePerBatch()- Returns:
- current total memory per batch (may change across batches)
-
getConfigRecordsPerBatch
public int getConfigRecordsPerBatch()- Returns:
- configured number of records per batch (may be different from the enforced one)
-
getConfigMemorySizePerBatch
public long getConfigMemorySizePerBatch()- Returns:
- configured memory size per batch (may be different from the enforced one)
-
onEndOfBatch
public void onEndOfBatch(int batchNumRecords, List<RecordBatchSizerManager.VarLenColumnBatchStats> batchStats) Enables this object to optimize the impact of overflows by computing more accurate VL column precision.- Parameters:
batchNumRecords- number of records in this batchbatchStats- columns statistics
-
close
public void close()Closes all resources managed by this object
-