Package org.apache.drill.exec.record
Class RecordBatchSizer.ColumnSize
java.lang.Object
org.apache.drill.exec.record.RecordBatchSizer.ColumnSize
- Enclosing class:
- RecordBatchSizer
Column size information.
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
allocateVector
(ValueVector vector, int recordCount) void
buildVectorInitializer
(VectorInitializer initializer) Add a single vector initializer to a collection for the entire batch.int
This returns actual entry size if rowCount > 0 or allocation size otherwise.float
int
This is the average actual per entry data size in bytes.int
int
This is the average per entry size of just pure data plus overhead of additional vectors we add on top like bits vector, offset vector etc.int
std pure data size per entry from Drill metadata, based on type.int
If there is an accurate std net size, that is returned.int
std net size per entry taking into account additional metadata vectors we add on top for variable length, cardinality etc.int
This is the total data size for the column, including children for map columns.int
This is the total net size for the column, including children for map columns.int
boolean
Returns true if there is an accurate std size.boolean
boolean
boolean
toString()
-
Field Details
-
prefix
-
metadata
-
-
Constructor Details
-
ColumnSize
-
-
Method Details
-
hasStdDataSize
public boolean hasStdDataSize()Returns true if there is an accurate std size. Otherwise it returns false.- Returns:
- True if there is an accurate std size. Otherwise it returns false.
-
getStdDataSizePerEntry
public int getStdDataSizePerEntry()std pure data size per entry from Drill metadata, based on type. Does not include metadata vector overhead we add for cardinality, variable length etc. For variable-width columns, we use 50 as std size for entry width. For repeated column, we assume repetition of 10. -
getStdNetSizePerEntry
public int getStdNetSizePerEntry()std net size per entry taking into account additional metadata vectors we add on top for variable length, cardinality etc. For variable-width columns, we use 50 as std data size for entry width. For repeated column, we assume repetition of 10. -
getDataSizePerEntry
public int getDataSizePerEntry()This is the average actual per entry data size in bytes. Does not include any overhead of metadata vectors. For repeated columns, it is average for the repeated array, not individual entry in the array. -
getNetSizePerEntry
public int getNetSizePerEntry()This is the average per entry size of just pure data plus overhead of additional vectors we add on top like bits vector, offset vector etc. This size is larger than the actual data size since this size includes per- column overhead for additional vectors we add for cardinality, variable length etc. -
getAllocSizePerEntry
public int getAllocSizePerEntry()This returns actual entry size if rowCount > 0 or allocation size otherwise. Use this for the cases when you might get empty batches with schema and you still need to do memory calculations based on just schema. -
getStdNetOrNetSizePerEntry
public int getStdNetOrNetSizePerEntry()If there is an accurate std net size, that is returned. Otherwise the net size is returned.- Returns:
- If there is an accurate std net size, that is returned. Otherwise the net size is returned.
-
getTotalDataSize
public int getTotalDataSize()This is the total data size for the column, including children for map columns. Does not include any overhead of metadata vectors. -
getTotalNetSize
public int getTotalNetSize()This is the total net size for the column, including children for map columns. Includes overhead of metadata vectors. -
getValueCount
public int getValueCount() -
getElementCount
public int getElementCount() -
getCardinality
public float getCardinality() -
isVariableWidth
public boolean isVariableWidth() -
getChildren
-
isComplex
public boolean isComplex() -
isRepeatedList
public boolean isRepeatedList() -
allocateVector
-
toString
-
buildVectorInitializer
Add a single vector initializer to a collection for the entire batch. Uses the observed column size information to predict the size needed when allocating a new vector for the same data. Adds a hint only for variable-width or repeated types; no extra information is needed for fixed width, non-repeated columns.- Parameters:
initializer
- the vector initializer to hold the hints for this column
-