Overview

Created to represent an active partition for the Hash-Join operator (active means: currently receiving data, or its data is being probed; as opposed to fully spilled partitions). After all the build/inner data is read for this partition - if all its data is in memory, then a hash table and a helper are created, and later this data would be probed. If all this partition's build/inner data was spilled, then it begins to work as an outer partition (see the flag "processingOuter") -- reusing some of the fields (e.g., currentBatch, currHVVector, writer, spillFile, partitionBatchesCount) for the outer.

Field Summary

Fields

Modifier and Type

Field

Description

static final String

HASH_VALUE_COLUMN_NAME

static final TypeProtos.MajorType

HVtype
Constructor Summary

Constructors

Constructor

Description

HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions)
Method Summary

Modifier and Type

Method

Description

void

allocateNewCurrentBatchAndHV()

Allocate a new current Vector Container and current HV vector

void

appendBatch(VectorAccessible batch)

Append the incoming batch (actually only the vectors of that batch) into the tmp list

void

appendInnerRow(VectorContainer buildContainer, int ind, int hashCode, HashJoinMemoryCalculator.BuildSidePartitioning calc)

Spills if needed

void

appendOuterRow(int hashCode, int recordsProcessed)

Outer always spills when batch is full

void

buildContainersHashTableAndHelper()

Creates the hash table and join helper for this partition.

void

cleanup(boolean deleteFile)

Free all in-memory allocated structures.

void

close()

void

closeWriter()

Close the writer without deleting the spill file

void

completeAnInnerBatch(boolean toInitialize, boolean needsSpill)

void

completeAnOuterBatch(boolean toInitialize)

void

decreaseRecordNumForKey(int currentIndex)

int

getBuildHashCode(int ind)

ArrayList<VectorContainer>

getContainers()

List<HashJoinMemoryCalculator.BatchStat>

getInMemoryBatches()

long

getInMemorySize()

int

getNextIndex(int compositeIndex)

com.carrotsearch.hppc.IntArrayList

getNextUnmatchedIndex()

int

getNumInMemoryBatches()

long

getNumInMemoryRecords()

int

getPartitionBatchesCount()

int

getPartitionNum()

int

getProbeHashCode(int ind)

int

getRecordNumForKey(int currentIndex)

String

getSpillFile()

org.apache.commons.lang3.tuple.Pair<Integer,Boolean>

getStartIndex(int probeIndex)

void

getStats(HashTableStats newStats)

boolean

isSpilled()

String

makeDebugString()

Creates a debugging string containing information about memory usage.

org.apache.commons.lang3.tuple.Pair<VectorContainer,Integer>

nextBatch()

int

probeForKey(int recordsProcessed, int hashCode)

boolean

setRecordMatched(int compositeIndex)

void

setRecordNumForKey(int currentIndex, int num)

void

spillThisPartition()

void

updateBatches()

void

updateProbeRecordsPerBatch(int newRecordsPerBatch)

Configure a different temporary batch size when spilling probe batches.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- HASH_VALUE_COLUMN_NAME
  
  public static final String HASH_VALUE_COLUMN_NAME
  See Also:
  
  Constant Field Values
- HVtype
  
  public static final TypeProtos.MajorType HVtype
Constructor Details
- HashPartition
  
  public HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions)
Method Details
- updateProbeRecordsPerBatch
  
  public void updateProbeRecordsPerBatch(int newRecordsPerBatch)
  
  Configure a different temporary batch size when spilling probe batches.
  
  Parameters:
  
  newRecordsPerBatch - The new temporary batch size to use.
- allocateNewCurrentBatchAndHV
  
  public void allocateNewCurrentBatchAndHV()
  
  Allocate a new current Vector Container and current HV vector
- appendInnerRow
  
  public void appendInnerRow(VectorContainer buildContainer, int ind, int hashCode, HashJoinMemoryCalculator.BuildSidePartitioning calc)
  
  Spills if needed
- appendOuterRow
  
  public void appendOuterRow(int hashCode, int recordsProcessed)
  
  Outer always spills when batch is full
- completeAnOuterBatch
  
  public void completeAnOuterBatch(boolean toInitialize)
- completeAnInnerBatch
  
  public void completeAnInnerBatch(boolean toInitialize, boolean needsSpill)
- appendBatch
  
  public void appendBatch(VectorAccessible batch)
  
  Append the incoming batch (actually only the vectors of that batch) into the tmp list
- spillThisPartition
  
  public void spillThisPartition()
- probeForKey
  
  public int probeForKey(int recordsProcessed, int hashCode) throws SchemaChangeException
  
  Throws:
  
  SchemaChangeException
- getRecordNumForKey
  
  public int getRecordNumForKey(int currentIndex)
- setRecordNumForKey
  
  public void setRecordNumForKey(int currentIndex, int num)
- decreaseRecordNumForKey
  
  public void decreaseRecordNumForKey(int currentIndex)
- getStartIndex
  
  public org.apache.commons.lang3.tuple.Pair<Integer,Boolean> getStartIndex(int probeIndex)
- getNextIndex
  
  public int getNextIndex(int compositeIndex)
- setRecordMatched
  
  public boolean setRecordMatched(int compositeIndex)
- getNextUnmatchedIndex
  
  public com.carrotsearch.hppc.IntArrayList getNextUnmatchedIndex()
- getBuildHashCode
  
  public int getBuildHashCode(int ind) throws SchemaChangeException
  
  Throws:
  
  SchemaChangeException
- getProbeHashCode
  
  public int getProbeHashCode(int ind) throws SchemaChangeException
  
  Throws:
  
  SchemaChangeException
- getContainers
  
  public ArrayList<VectorContainer> getContainers()
- updateBatches
  
  public void updateBatches() throws SchemaChangeException
  
  Throws:
  
  SchemaChangeException
- nextBatch
  
  public org.apache.commons.lang3.tuple.Pair<VectorContainer,Integer> nextBatch()
- getInMemoryBatches
  
  public List<HashJoinMemoryCalculator.BatchStat> getInMemoryBatches()
  
  Specified by:
  
  getInMemoryBatches in interface HashJoinMemoryCalculator.PartitionStat
- getNumInMemoryBatches
  
  public int getNumInMemoryBatches()
  
  Specified by:
  
  getNumInMemoryBatches in interface HashJoinMemoryCalculator.PartitionStat
- isSpilled
  
  public boolean isSpilled()
  
  Specified by:
  
  isSpilled in interface HashJoinMemoryCalculator.PartitionStat
- getNumInMemoryRecords
  
  public long getNumInMemoryRecords()
  
  Specified by:
  
  getNumInMemoryRecords in interface HashJoinMemoryCalculator.PartitionStat
- getInMemorySize
  
  public long getInMemorySize()
  
  Specified by:
  
  getInMemorySize in interface HashJoinMemoryCalculator.PartitionStat
- getSpillFile
  
  public String getSpillFile()
- getPartitionBatchesCount
  
  public int getPartitionBatchesCount()
- getPartitionNum
  
  public int getPartitionNum()
- closeWriter
  
  public void closeWriter()
  
  Close the writer without deleting the spill file
- buildContainersHashTableAndHelper
  
  public void buildContainersHashTableAndHelper() throws SchemaChangeException
  
  Creates the hash table and join helper for this partition. This method should only be called after all the build side records have been consumed.
  
  Throws:
  
  SchemaChangeException
- getStats
  
  public void getStats(HashTableStats newStats)
- cleanup
  
  public void cleanup(boolean deleteFile)
  
  Free all in-memory allocated structures.
  
  Parameters:
  
  deleteFile - - whether to delete the spill file or not
- close
  
  public void close()
- makeDebugString
  
  public String makeDebugString()
  
  Creates a debugging string containing information about memory usage.
  
  Returns:
  
  A debugging string.

Class HashPartition

Overview

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

HASH_VALUE_COLUMN_NAME

HVtype

Constructor Details

HashPartition

Method Details

updateProbeRecordsPerBatch

allocateNewCurrentBatchAndHV

appendInnerRow

appendOuterRow

completeAnOuterBatch

completeAnInnerBatch

appendBatch

spillThisPartition

probeForKey

getRecordNumForKey

setRecordNumForKey

decreaseRecordNumForKey

getStartIndex

getNextIndex

setRecordMatched

getNextUnmatchedIndex

getBuildHashCode

getProbeHashCode

getContainers

updateBatches

nextBatch

getInMemoryBatches

getNumInMemoryBatches

isSpilled

getNumInMemoryRecords

getInMemorySize

getSpillFile

getPartitionBatchesCount

getPartitionNum

closeWriter

buildContainersHashTableAndHelper

getStats

cleanup

close

makeDebugString