Package org.apache.drill.exec.record
Class RecordBatchSizer
java.lang.Object
org.apache.drill.exec.record.RecordBatchSizer
Given a record batch or vector container, determines the actual memory
consumed by each column, the average row, and the entire record batch.
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionint
Maximum width of a column; used for memory estimation in case of Varcharsint
Count the nullable columns; used for memory estimation -
Constructor Summary
ConstructorDescriptionRecordBatchSizer
(RecordBatch batch) Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch.)Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch) along with a selection vector for those records. -
Method Summary
Modifier and TypeMethodDescriptionvoid
allocateVectors
(VectorContainer container, int recordCount) void
applySv2()
The column size information gathered here represents empirically-derived schema metadata.columns()
This is a convenience method to get the sizes of columns in the same order that the corresponding value vectors are stored within aVectorAccessible
.long
int
int
int
long
int
int
Compute the "real" width of the row, taking into account each varchar column size (historically capped at 50, and rounded up to power of 2 to match drill buf allocation) and null marking columns.int
static int
getStdNetSizePerEntryCommon
(TypeProtos.MajorType majorType, boolean isOptional, boolean isRepeated, boolean isRepeatedList, Map<String, RecordBatchSizer.ColumnSize> children) int
boolean
hasSv2()
static long
multiplyByFactor
(long size, double factor) static long
multiplyByFactors
(long size, double... factors) int
rowCount()
static int
safeDivide
(int num, double denom) static int
safeDivide
(int num, float denom) static int
safeDivide
(int num, int denom) static int
safeDivide
(long num, long denom) toString()
-
Field Details
-
sv2
-
maxSize
public int maxSizeMaximum width of a column; used for memory estimation in case of Varchars -
nullableCount
public int nullableCountCount the nullable columns; used for memory estimation
-
-
Constructor Details
-
RecordBatchSizer
-
RecordBatchSizer
Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch.)- Parameters:
va
- iterator over the batch's vectors
-
RecordBatchSizer
Create empirical metadata for a record batch given a vector accessible (basically, an iterator over the vectors in the batch) along with a selection vector for those records. The selection vector is used to pad the estimated row width with the extra two bytes needed per record. The selection vector memory is added to the total memory consumed by this batch.- Parameters:
va
- iterator over the batch's vectorssv2
- selection vector associated with this batch
-
-
Method Details
-
multiplyByFactors
public static long multiplyByFactors(long size, double... factors) -
multiplyByFactor
public static long multiplyByFactor(long size, double factor) -
getStdNetSizePerEntryCommon
public static int getStdNetSizePerEntryCommon(TypeProtos.MajorType majorType, boolean isOptional, boolean isRepeated, boolean isRepeatedList, Map<String, RecordBatchSizer.ColumnSize> children) -
getColumn
-
applySv2
public void applySv2() -
safeDivide
public static int safeDivide(long num, long denom) -
safeDivide
public static int safeDivide(int num, int denom) -
safeDivide
public static int safeDivide(int num, float denom) -
safeDivide
public static int safeDivide(int num, double denom) -
rowCount
public int rowCount() -
getStdRowWidth
public int getStdRowWidth() -
getRowAllocWidth
public int getRowAllocWidth() -
getActualSize
public long getActualSize() -
getGrossRowWidth
public int getGrossRowWidth() -
getAvgDensity
public int getAvgDensity() -
getNetRowWidth
public int getNetRowWidth() -
columns
-
columnsList
This is a convenience method to get the sizes of columns in the same order that the corresponding value vectors are stored within aVectorAccessible
.- Returns:
- The sizes of columns in the same order that the corresponding value vectors are stored within a
VectorAccessible
.
-
getNetRowWidthCap50
public int getNetRowWidthCap50()Compute the "real" width of the row, taking into account each varchar column size (historically capped at 50, and rounded up to power of 2 to match drill buf allocation) and null marking columns.- Returns:
- "real" width of the row
-
hasSv2
public boolean hasSv2() -
getNetBatchSize
public long getNetBatchSize() -
getMaxAvgColumnSize
public int getMaxAvgColumnSize() -
toString
-
buildVectorInitializer
The column size information gathered here represents empirically-derived schema metadata. Use that metadata to create an instance of a class that allocates memory for new vectors based on the observed size information. The caller provides the row count; the size information here provides column widths and the number of elements in each array. -
allocateVectors
-