java.lang.Object
org.apache.drill.exec.store.parquet.columnreaders.batchsizing.BatchSizingMemoryUtil

public final class BatchSizingMemoryUtil extends Object
Helper class to assist the Flat Parquet reader build batches which adhere to memory sizing constraints
  • Field Details

    • BYTE_VALUE_WIDTH

      public static final int BYTE_VALUE_WIDTH
      BYTE in-memory width
      See Also:
    • INT_VALUE_WIDTH

      public static final int INT_VALUE_WIDTH
      INT in-memory width
      See Also:
    • DEFAULT_VL_COLUMN_AVG_PRECISION

      public static final int DEFAULT_VL_COLUMN_AVG_PRECISION
      Default variable length column average precision; computed in such a way that 64k values will fit within one MB to minimize internal fragmentation
      See Also:
  • Method Details

    • canAddNewData

      public static boolean canAddNewData(BatchSizingMemoryUtil.ColumnMemoryUsageInfo columnMemoryUsage, long newBitsMemory, long newOffsetsMemory, long newDataMemory)
      This method will also load detailed information about this column's current memory usage (with regard to the value vectors).
      Parameters:
      columnMemoryUsage - container which contains column's memory usage information (usage information will be automatically updated by this method)
      newBitsMemory - New nullable data which might be inserted when processing a new input chunk
      newOffsetsMemory - New offsets data which might be inserted when processing a new input chunk
      newDataMemory - New data which might be inserted when processing a new input chunk
      Returns:
      true if adding the new data will not lead this column's Value Vector go beyond the allowed limit; false otherwise
    • getMemoryUsage

      public static void getMemoryUsage(ValueVector sourceVector, int currValueCount, BatchSizingMemoryUtil.VectorMemoryUsageInfo vectorMemoryUsage)
      Load memory usage information for a variable length value vector
      Parameters:
      sourceVector - source value vector
      currValueCount - current value count
      vectorMemoryUsage - result object which contains source vector memory usage information
    • getFixedColumnTypePrecision

      public static int getFixedColumnTypePrecision(ParquetColumnMetadata column)
      Parameters:
      column - fixed column's metadata
      Returns:
      column byte precision
    • getAvgVariableLengthColumnTypePrecision

      public static int getAvgVariableLengthColumnTypePrecision(ParquetColumnMetadata column)
      This method will return a default value for variable columns; it aims at minimizing internal fragmentation.

      Note that the TypeHelper uses a large default value which might not be always appropriate.

      Parameters:
      column - fixed column's metadata
      Returns:
      column byte precision
    • computeFixedLengthVectorMemory

      public static long computeFixedLengthVectorMemory(ParquetColumnMetadata column, int valueCount)
      Parameters:
      column - column's metadata
      valueCount - number of column values
      Returns:
      memory size required to store "valueCount" within a value vector
    • computeVariableLengthVectorMemory

      public static long computeVariableLengthVectorMemory(ParquetColumnMetadata column, long averagePrecision, int valueCount)
      Parameters:
      column - length column's metadata
      averagePrecision - VL column average precision
      valueCount - number of column values
      Returns:
      memory size required to store "valueCount" within a value vector