Class RecordBatchSizer.ColumnSize

java.lang.Object
org.apache.drill.exec.record.RecordBatchSizer.ColumnSize
Enclosing class:
RecordBatchSizer

public class RecordBatchSizer.ColumnSize extends Object
Column size information.
  • Field Details

  • Constructor Details

  • Method Details

    • hasStdDataSize

      public boolean hasStdDataSize()
      Returns true if there is an accurate std size. Otherwise it returns false.
      Returns:
      True if there is an accurate std size. Otherwise it returns false.
    • getStdDataSizePerEntry

      public int getStdDataSizePerEntry()
      std pure data size per entry from Drill metadata, based on type. Does not include metadata vector overhead we add for cardinality, variable length etc. For variable-width columns, we use 50 as std size for entry width. For repeated column, we assume repetition of 10.
    • getStdNetSizePerEntry

      public int getStdNetSizePerEntry()
      std net size per entry taking into account additional metadata vectors we add on top for variable length, cardinality etc. For variable-width columns, we use 50 as std data size for entry width. For repeated column, we assume repetition of 10.
    • getDataSizePerEntry

      public int getDataSizePerEntry()
      This is the average actual per entry data size in bytes. Does not include any overhead of metadata vectors. For repeated columns, it is average for the repeated array, not individual entry in the array.
    • getNetSizePerEntry

      public int getNetSizePerEntry()
      This is the average per entry size of just pure data plus overhead of additional vectors we add on top like bits vector, offset vector etc. This size is larger than the actual data size since this size includes per- column overhead for additional vectors we add for cardinality, variable length etc.
    • getAllocSizePerEntry

      public int getAllocSizePerEntry()
      This returns actual entry size if rowCount > 0 or allocation size otherwise. Use this for the cases when you might get empty batches with schema and you still need to do memory calculations based on just schema.
    • getStdNetOrNetSizePerEntry

      public int getStdNetOrNetSizePerEntry()
      If there is an accurate std net size, that is returned. Otherwise the net size is returned.
      Returns:
      If there is an accurate std net size, that is returned. Otherwise the net size is returned.
    • getTotalDataSize

      public int getTotalDataSize()
      This is the total data size for the column, including children for map columns. Does not include any overhead of metadata vectors.
    • getTotalNetSize

      public int getTotalNetSize()
      This is the total net size for the column, including children for map columns. Includes overhead of metadata vectors.
    • getValueCount

      public int getValueCount()
    • getElementCount

      public int getElementCount()
    • getCardinality

      public float getCardinality()
    • isVariableWidth

      public boolean isVariableWidth()
    • getChildren

      public Map<String,RecordBatchSizer.ColumnSize> getChildren()
    • isComplex

      public boolean isComplex()
    • isRepeatedList

      public boolean isRepeatedList()
    • allocateVector

      public void allocateVector(ValueVector vector, int recordCount)
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • buildVectorInitializer

      public void buildVectorInitializer(VectorInitializer initializer)
      Add a single vector initializer to a collection for the entire batch. Uses the observed column size information to predict the size needed when allocating a new vector for the same data. Adds a hint only for variable-width or repeated types; no extra information is needed for fixed width, non-repeated columns.
      Parameters:
      initializer - the vector initializer to hold the hints for this column