Interface HashJoinMemoryCalculator

All Superinterfaces:
HashJoinStateCalculator<HashJoinMemoryCalculator.BuildSidePartitioning>
All Known Implementing Classes:
HashJoinMechanicalMemoryCalculator, HashJoinMemoryCalculatorImpl

public interface HashJoinMemoryCalculator extends HashJoinStateCalculator<HashJoinMemoryCalculator.BuildSidePartitioning>

This class is responsible for managing the memory calculations for the HashJoin operator. Since the HashJoin operator has different phases of execution, this class needs to perform different memory calculations at each phase. The phases of execution have been broken down into an explicit state machine diagram below. What ocurrs in each state is described in the documentation of the HashJoinState class below. Note: the transition from Probing and Partitioning back to Build Side Partitioning. This happens when we had to spill probe side partitions and we needed to recursively process spilled partitions. This recursion is described in more detail in the example below.

+--------------+ <-------+ | Build Side | | | Partitioning| | | | | +------+-------+ | | | | | v | +--------------+ | |Probing and | | |Partitioning | | | | | +--------------+ | | | +----------------+ | v Done

An overview of how these states interact can be summarized with the following example.

Consider the case where we have 4 partition configured initially.

  1. We first start consuming build side batches and putting their records into one of 4 build side partitions.
  2. Once we run out of memory we start spilling build side partition one by one
  3. We keep partitioning build side batches until all the build side batches are consumed.
  4. After we have consumed the build side we prepare to probe by building hashtables for the partitions we have in memory. If we don't have enough room for all the hashtables in memory we spill build side partitions until we do have enough room.
  5. We now start processing the probe side. For each probe record we determine its build partition. If the build partition is in memory we do the join for the record and emit it. If the build partition is not in memory we spill the probe record. We continue this process until all the probe side records are consumed.
  6. If we didn't spill any probe side partitions because all the build side partition were in memory, our join operation is done. If we did spill probe side partitions we have to recursively repeat this whole process for each spilled probe and build side partition pair.

  • Method Details

    • initialize

      void initialize(boolean doMemoryCalc)