Class RecordBatchSizer

java.lang.Object
org.apache.drill.exec.record.RecordBatchSizer

public class RecordBatchSizer extends Object
Given a record batch or vector container, determines the actual memory consumed by each column, the average row, and the entire record batch.
  • Field Details

    • sv2

      public SelectionVector2 sv2
    • maxSize

      public int maxSize
      Maximum width of a column; used for memory estimation in case of Varchars
    • nullableCount

      public int nullableCount
      Count the nullable columns; used for memory estimation
  • Constructor Details

    • RecordBatchSizer

      public RecordBatchSizer(RecordBatch batch)
    • RecordBatchSizer

      public RecordBatchSizer(VectorAccessible va)
      Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch.)
      Parameters:
      va - iterator over the batch's vectors
    • RecordBatchSizer

      public RecordBatchSizer(VectorAccessible va, SelectionVector2 sv2)
      Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch) along with a selection vector for those records. The selection vector is used to pad the estimated row width with the extra two bytes needed per record. The selection vector memory is added to the total memory consumed by this batch.
      Parameters:
      va - iterator over the batch's vectors
      sv2 - selection vector associated with this batch
  • Method Details

    • multiplyByFactors

      public static long multiplyByFactors(long size, double... factors)
    • multiplyByFactor

      public static long multiplyByFactor(long size, double factor)
    • getStdNetSizePerEntryCommon

      public static int getStdNetSizePerEntryCommon(TypeProtos.MajorType majorType, boolean isOptional, boolean isRepeated, boolean isRepeatedList, Map<String,RecordBatchSizer.ColumnSize> children)
    • getColumn

      public RecordBatchSizer.ColumnSize getColumn(String name)
    • applySv2

      public void applySv2()
    • safeDivide

      public static int safeDivide(long num, long denom)
    • safeDivide

      public static int safeDivide(int num, int denom)
    • safeDivide

      public static int safeDivide(int num, float denom)
    • safeDivide

      public static int safeDivide(int num, double denom)
    • rowCount

      public int rowCount()
    • getStdRowWidth

      public int getStdRowWidth()
    • getRowAllocWidth

      public int getRowAllocWidth()
    • getActualSize

      public long getActualSize()
    • getGrossRowWidth

      public int getGrossRowWidth()
    • getAvgDensity

      public int getAvgDensity()
    • getNetRowWidth

      public int getNetRowWidth()
    • columns

    • columnsList

      public List<RecordBatchSizer.ColumnSize> columnsList()
      This is a convenience method to get the sizes of columns in the same order that the corresponding value vectors are stored within a VectorAccessible.
      Returns:
      The sizes of columns in the same order that the corresponding value vectors are stored within a VectorAccessible.
    • getNetRowWidthCap50

      public int getNetRowWidthCap50()
      Compute the "real" width of the row, taking into account each varchar column size (historically capped at 50, and rounded up to power of 2 to match drill buf allocation) and null marking columns.
      Returns:
      "real" width of the row
    • hasSv2

      public boolean hasSv2()
    • getNetBatchSize

      public long getNetBatchSize()
    • getMaxAvgColumnSize

      public int getMaxAvgColumnSize()
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • buildVectorInitializer

      public VectorInitializer buildVectorInitializer()
      The column size information gathered here represents empirically-derived schema metadata. Use that metadata to create an instance of a class that allocates memory for new vectors based on the observed size information. The caller provides the row count; the size information here provides column widths and the number of elements in each array.
    • allocateVectors

      public void allocateVectors(VectorContainer container, int recordCount)