Class BatchSizingMemoryUtil
java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.BatchSizingMemoryUtil
Helper class to assist the Flat Parquet reader build batches which adhere to memory sizing constraints
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classA container class to hold a column batch memory usage information.static final classContainer class which holds memory usage information about a variable lengthValueVector; all values are in bytes. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intBYTE in-memory widthstatic final intDefault variable length column average precision; computed in such a way that 64k values will fit within one MB to minimize internal fragmentationstatic final intINT in-memory width -
Method Summary
Modifier and TypeMethodDescriptionstatic booleancanAddNewData(BatchSizingMemoryUtil.ColumnMemoryUsageInfo columnMemoryUsage, long newBitsMemory, long newOffsetsMemory, long newDataMemory) This method will also load detailed information about this column's current memory usage (with regard to the value vectors).static longcomputeFixedLengthVectorMemory(ParquetColumnMetadata column, int valueCount) static longcomputeVariableLengthVectorMemory(ParquetColumnMetadata column, long averagePrecision, int valueCount) static intThis method will return a default value for variable columns; it aims at minimizing internal fragmentation.static intstatic voidgetMemoryUsage(ValueVector sourceVector, int currValueCount, BatchSizingMemoryUtil.VectorMemoryUsageInfo vectorMemoryUsage) Load memory usage information for a variable length value vector
-
Field Details
-
BYTE_VALUE_WIDTH
public static final int BYTE_VALUE_WIDTHBYTE in-memory width- See Also:
-
INT_VALUE_WIDTH
public static final int INT_VALUE_WIDTHINT in-memory width- See Also:
-
DEFAULT_VL_COLUMN_AVG_PRECISION
public static final int DEFAULT_VL_COLUMN_AVG_PRECISIONDefault variable length column average precision; computed in such a way that 64k values will fit within one MB to minimize internal fragmentation- See Also:
-
-
Method Details
-
canAddNewData
public static boolean canAddNewData(BatchSizingMemoryUtil.ColumnMemoryUsageInfo columnMemoryUsage, long newBitsMemory, long newOffsetsMemory, long newDataMemory) This method will also load detailed information about this column's current memory usage (with regard to the value vectors).- Parameters:
columnMemoryUsage- container which contains column's memory usage information (usage information will be automatically updated by this method)newBitsMemory- New nullable data which might be inserted when processing a new input chunknewOffsetsMemory- New offsets data which might be inserted when processing a new input chunknewDataMemory- New data which might be inserted when processing a new input chunk- Returns:
- true if adding the new data will not lead this column's Value Vector go beyond the allowed limit; false otherwise
-
getMemoryUsage
public static void getMemoryUsage(ValueVector sourceVector, int currValueCount, BatchSizingMemoryUtil.VectorMemoryUsageInfo vectorMemoryUsage) Load memory usage information for a variable length value vector- Parameters:
sourceVector- source value vectorcurrValueCount- current value countvectorMemoryUsage- result object which contains source vector memory usage information
-
getFixedColumnTypePrecision
- Parameters:
column- fixed column's metadata- Returns:
- column byte precision
-
getAvgVariableLengthColumnTypePrecision
This method will return a default value for variable columns; it aims at minimizing internal fragmentation.Note that the
TypeHelperuses a large default value which might not be always appropriate.- Parameters:
column- fixed column's metadata- Returns:
- column byte precision
-
computeFixedLengthVectorMemory
- Parameters:
column- column's metadatavalueCount- number of column values- Returns:
- memory size required to store "valueCount" within a value vector
-
computeVariableLengthVectorMemory
public static long computeVariableLengthVectorMemory(ParquetColumnMetadata column, long averagePrecision, int valueCount) - Parameters:
column- length column's metadataaveragePrecision- VL column average precisionvalueCount- number of column values- Returns:
- memory size required to store "valueCount" within a value vector
-