Class HashPartition

java.lang.Object
org.apache.drill.exec.physical.impl.common.HashPartition
All Implemented Interfaces:
HashJoinMemoryCalculator.PartitionStat

public class HashPartition extends Object implements HashJoinMemoryCalculator.PartitionStat

Overview

Created to represent an active partition for the Hash-Join operator (active means: currently receiving data, or its data is being probed; as opposed to fully spilled partitions). After all the build/inner data is read for this partition - if all its data is in memory, then a hash table and a helper are created, and later this data would be probed. If all this partition's build/inner data was spilled, then it begins to work as an outer partition (see the flag "processingOuter") -- reusing some of the fields (e.g., currentBatch, currHVVector, writer, spillFile, partitionBatchesCount) for the outer.

  • Field Details

  • Constructor Details

  • Method Details

    • updateProbeRecordsPerBatch

      public void updateProbeRecordsPerBatch(int newRecordsPerBatch)
      Configure a different temporary batch size when spilling probe batches.
      Parameters:
      newRecordsPerBatch - The new temporary batch size to use.
    • allocateNewCurrentBatchAndHV

      public void allocateNewCurrentBatchAndHV()
      Allocate a new current Vector Container and current HV vector
    • appendInnerRow

      public void appendInnerRow(VectorContainer buildContainer, int ind, int hashCode, HashJoinMemoryCalculator.BuildSidePartitioning calc)
      Spills if needed
    • appendOuterRow

      public void appendOuterRow(int hashCode, int recordsProcessed)
      Outer always spills when batch is full
    • completeAnOuterBatch

      public void completeAnOuterBatch(boolean toInitialize)
    • completeAnInnerBatch

      public void completeAnInnerBatch(boolean toInitialize, boolean needsSpill)
    • appendBatch

      public void appendBatch(VectorAccessible batch)
      Append the incoming batch (actually only the vectors of that batch) into the tmp list
    • spillThisPartition

      public void spillThisPartition()
    • probeForKey

      public int probeForKey(int recordsProcessed, int hashCode) throws SchemaChangeException
      Throws:
      SchemaChangeException
    • getRecordNumForKey

      public int getRecordNumForKey(int currentIndex)
    • setRecordNumForKey

      public void setRecordNumForKey(int currentIndex, int num)
    • decreaseRecordNumForKey

      public void decreaseRecordNumForKey(int currentIndex)
    • getStartIndex

      public org.apache.commons.lang3.tuple.Pair<Integer,Boolean> getStartIndex(int probeIndex)
    • getNextIndex

      public int getNextIndex(int compositeIndex)
    • setRecordMatched

      public boolean setRecordMatched(int compositeIndex)
    • getNextUnmatchedIndex

      public com.carrotsearch.hppc.IntArrayList getNextUnmatchedIndex()
    • getBuildHashCode

      public int getBuildHashCode(int ind) throws SchemaChangeException
      Throws:
      SchemaChangeException
    • getProbeHashCode

      public int getProbeHashCode(int ind) throws SchemaChangeException
      Throws:
      SchemaChangeException
    • getContainers

      public ArrayList<VectorContainer> getContainers()
    • updateBatches

      public void updateBatches() throws SchemaChangeException
      Throws:
      SchemaChangeException
    • nextBatch

      public org.apache.commons.lang3.tuple.Pair<VectorContainer,Integer> nextBatch()
    • getInMemoryBatches

      public List<HashJoinMemoryCalculator.BatchStat> getInMemoryBatches()
      Specified by:
      getInMemoryBatches in interface HashJoinMemoryCalculator.PartitionStat
    • getNumInMemoryBatches

      public int getNumInMemoryBatches()
      Specified by:
      getNumInMemoryBatches in interface HashJoinMemoryCalculator.PartitionStat
    • isSpilled

      public boolean isSpilled()
      Specified by:
      isSpilled in interface HashJoinMemoryCalculator.PartitionStat
    • getNumInMemoryRecords

      public long getNumInMemoryRecords()
      Specified by:
      getNumInMemoryRecords in interface HashJoinMemoryCalculator.PartitionStat
    • getInMemorySize

      public long getInMemorySize()
      Specified by:
      getInMemorySize in interface HashJoinMemoryCalculator.PartitionStat
    • getSpillFile

      public String getSpillFile()
    • getPartitionBatchesCount

      public int getPartitionBatchesCount()
    • getPartitionNum

      public int getPartitionNum()
    • closeWriter

      public void closeWriter()
      Close the writer without deleting the spill file
    • buildContainersHashTableAndHelper

      public void buildContainersHashTableAndHelper() throws SchemaChangeException
      Creates the hash table and join helper for this partition. This method should only be called after all the build side records have been consumed.
      Throws:
      SchemaChangeException
    • getStats

      public void getStats(HashTableStats newStats)
    • cleanup

      public void cleanup(boolean deleteFile)
      Free all in-memory allocated structures.
      Parameters:
      deleteFile - - whether to delete the spill file or not
    • close

      public void close()
    • makeDebugString

      public String makeDebugString()
      Creates a debugging string containing information about memory usage.
      Returns:
      A debugging string.