Interface HashTable
- All Known Implementing Classes:
HashTableTemplate
public interface HashTable
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final int
The batch size used for internal batch holdersstatic final float
The default load factor of a hash table.static final int
The maximum capacity of the hash table (in terms of number of buckets).static final TemplateClassDefinition<HashTable>
-
Method Summary
Modifier and TypeMethodDescriptionvoid
clear()
Frees all the direct memory consumed by theHashTable
.void
decreaseRecordNumForKey
(int currentIndex) Decrease the count of records for a specific key by one.long
The amount of direct memory consumed by the hash table.int
getBuildHashCode
(int incomingRowIdx) Computes the hash code for the record at the given index in the build side batch.int
getProbeHashCode
(int incomingRowIdx) Computes the hash code for the record at the given index in the probe side batch.int
getRecordNumForKey
(int currentIndex) void
getStats
(HashTableStats stats) int
boolean
isEmpty()
Returns a message containing memory usage statistics.org.apache.commons.lang3.tuple.Pair<VectorContainer,
Integer> boolean
outputKeys
(int batchIdx, VectorContainer outContainer, int numRecords) Retrieves the key columns and transfers them to the output container.int
probeForKey
(int incomingRowIdx, int hashCode) put
(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode, int batchSize) void
reset()
Clears all the memory used by theHashTable
and re-initializes it.void
setRecordNumForKey
(int currentIndex, int num) Set the count of records for a specific key to num.void
setTargetBatchRowCount
(int batchRowCount) void
setup
(HashTableConfig htConfig, BufferAllocator allocator, VectorContainer incomingBuild, RecordBatch incomingProbe, RecordBatch outgoing, VectorContainer htContainerOrig, FragmentContext context, ClassGenerator<?> cg) setup(org.apache.drill.exec.physical.impl.common.HashTableConfig, org.apache.drill.exec.memory.BufferAllocator, org.apache.drill.exec.record.VectorContainer, org.apache.drill.exec.record.RecordBatch, org.apache.drill.exec.record.RecordBatch, org.apache.drill.exec.record.VectorContainer, org.apache.drill.exec.ops.FragmentContext, org.apache.drill.exec.expr.ClassGenerator<?>)
must be called before anything can be done to theHashTable
.int
size()
void
Updates the incoming (build and probe side) value vectors references in theHashTableTemplate.BatchHolder
s.void
updateIncoming
(VectorContainer newIncoming, RecordBatch newIncomingProbe) Changes the incoming probe and build side batches, and then updates all the value vector references in theHashTableTemplate.BatchHolder
s.void
updateInitialCapacity
(int initialCapacity) Update the initial capacity for the hash table.
-
Field Details
-
TEMPLATE_DEFINITION
-
MAXIMUM_CAPACITY
static final int MAXIMUM_CAPACITYThe maximum capacity of the hash table (in terms of number of buckets).- See Also:
-
DEFAULT_LOAD_FACTOR
static final float DEFAULT_LOAD_FACTORThe default load factor of a hash table.- See Also:
-
BATCH_SIZE
static final int BATCH_SIZEThe batch size used for internal batch holders- See Also:
-
BATCH_MASK
static final int BATCH_MASK- See Also:
-
-
Method Details
-
setup
void setup(HashTableConfig htConfig, BufferAllocator allocator, VectorContainer incomingBuild, RecordBatch incomingProbe, RecordBatch outgoing, VectorContainer htContainerOrig, FragmentContext context, ClassGenerator<?> cg) setup(org.apache.drill.exec.physical.impl.common.HashTableConfig, org.apache.drill.exec.memory.BufferAllocator, org.apache.drill.exec.record.VectorContainer, org.apache.drill.exec.record.RecordBatch, org.apache.drill.exec.record.RecordBatch, org.apache.drill.exec.record.VectorContainer, org.apache.drill.exec.ops.FragmentContext, org.apache.drill.exec.expr.ClassGenerator<?>)
must be called before anything can be done to theHashTable
.- Parameters:
htConfig
-allocator
-incomingBuild
-incomingProbe
-outgoing
-htContainerOrig
-context
-cg
-
-
updateBatches
Updates the incoming (build and probe side) value vectors references in theHashTableTemplate.BatchHolder
s. This is useful on OK_NEW_SCHEMA (need to verify).- Throws:
SchemaChangeException
-
getBuildHashCode
Computes the hash code for the record at the given index in the build side batch.- Parameters:
incomingRowIdx
- The index of the build side record of interest.- Returns:
- The hash code for the record at the given index in the build side batch.
- Throws:
SchemaChangeException
-
getProbeHashCode
Computes the hash code for the record at the given index in the probe side batch.- Parameters:
incomingRowIdx
- The index of the probe side record of interest.- Returns:
- The hash code for the record at the given index in the probe side batch.
- Throws:
SchemaChangeException
-
put
HashTable.PutStatus put(int incomingRowIdx, IndexPointer htIdxHolder, int hashCode, int batchSize) throws SchemaChangeException, RetryAfterSpillException -
probeForKey
- Parameters:
incomingRowIdx
- The index of the key in the probe batch.hashCode
- The hashCode of the key.- Returns:
- Returns -1 if the data in the probe batch at the given incomingRowIdx is not in the hash table. Otherwise returns the composite index of the key in the hash table (index of BatchHolder and record in Batch Holder).
- Throws:
SchemaChangeException
-
getRecordNumForKey
int getRecordNumForKey(int currentIndex) - Parameters:
currentIndex
- The composite index of the key in the hash table (index of BatchHolder and record in Batch Holder).- Returns:
- Returns -1 if the count of records for a specific key is not computed. Otherwise returns the count of records for a specific key.
-
setRecordNumForKey
void setRecordNumForKey(int currentIndex, int num) Set the count of records for a specific key to num.- Parameters:
currentIndex
- The composite index of the key in the hash table (index of BatchHolder and record in Batch Holder).num
- The count of records for a specific key to be set.
-
decreaseRecordNumForKey
void decreaseRecordNumForKey(int currentIndex) Decrease the count of records for a specific key by one.- Parameters:
currentIndex
- The composite index of the key in the hash table (index of BatchHolder and record in Batch Holder).
-
getStats
-
size
int size() -
isEmpty
boolean isEmpty() -
clear
void clear()Frees all the direct memory consumed by theHashTable
. -
updateInitialCapacity
void updateInitialCapacity(int initialCapacity) Update the initial capacity for the hash table. This method will be removed after the key vectors are removed from the hash table. It is used to allocateHashTableTemplate.BatchHolder
s of appropriate size when the final size of the HashTable is known. Warning! Only call this method before you have inserted elements into the HashTable.- Parameters:
initialCapacity
- The new initial capacity to use.
-
updateIncoming
Changes the incoming probe and build side batches, and then updates all the value vector references in theHashTableTemplate.BatchHolder
s.- Parameters:
newIncoming
- The new build side batch.newIncomingProbe
- The new probe side batch.
-
reset
void reset()Clears all the memory used by theHashTable
and re-initializes it. -
outputKeys
Retrieves the key columns and transfers them to the output container. Note this operation removes the key columns from theHashTable
.- Parameters:
batchIdx
- The index of aHashTableTemplate.BatchHolder
in the HashTable.outContainer
- The destination container for the key columns.numRecords
- The number of key recorts to transfer.- Returns:
-
makeDebugString
String makeDebugString()Returns a message containing memory usage statistics. Intended to be used for printing debugging or error messages.- Returns:
- A debug string.
-
getActualSize
long getActualSize()The amount of direct memory consumed by the hash table.- Returns:
-
setTargetBatchRowCount
void setTargetBatchRowCount(int batchRowCount) -
getTargetBatchRowCount
int getTargetBatchRowCount() -
nextBatch
org.apache.commons.lang3.tuple.Pair<VectorContainer,Integer> nextBatch()
-