Class PriorityQueueCopierWrapper.BatchMerger

java.lang.Object
org.apache.drill.exec.physical.impl.xsort.PriorityQueueCopierWrapper.BatchMerger
All Implemented Interfaces:
AutoCloseable, SortImpl.SortResults
Enclosing class:
PriorityQueueCopierWrapper

public static class PriorityQueueCopierWrapper.BatchMerger extends Object implements SortImpl.SortResults, AutoCloseable
We've gathered a set of batches, each of which has been sorted. The batches may have passed through a filter and thus may have "holes" where rows have been filtered out. We will spill records in blocks of targetRecordCount. To prepare, copy that many records into an outputContainer as a set of contiguous values in new vectors. The result is a single batch with vectors that combine a collection of input batches up to the given threshold.

Input. Here the top line is a selection vector of indexes. The second line is a set of batch groups (separated by underscores) with letters indicating individual records:

 [3 7 4 8 0 6 1] [5 3 6 8 2 0]
 [eh_ad_ibf]     [r_qm_kn_p]

Output, assuming blocks of 5 records. The brackets represent batches, the line represents the set of batches copied to the spill file.

 [abcde] [fhikm] [npqr]

The copying operation does a merge as well: copying values from the sources in ordered fashion. Consider a different example, we want to merge two input batches to produce a single output batch:

 Input:  [aceg] [bdfh]
 Output: [abcdefgh]

In the above, the input consists of two sorted batches. (In reality, the input batches have an associated selection vector, but that is omitted here and just the sorted values shown.) The output is a single batch with the merged records (indicated by letters) from the two input batches.

Here we bind the copier to the batchGroupList of sorted, buffered batches to be merged. We bind the copier output to outputContainer: the copier will write its merged "batches" of records to that container.

Calls to the next() method sequentially return merged batches of the desired row count.