Class MergeSortWrapper
- All Implemented Interfaces:
SortImpl.SortResults
Since all batches are in memory, we don't want to use the usual merge algorithm as that makes a copy of the original batches (which were read from a spill file) to produce an output batch. Instead, we want to use the in-memory batches as-is. To do this, we use a selection vector 4 (SV4) as a global index into the collection of batches. The SV4 uses the upper two bytes as the batch index, and the lower two as an offset of a record within the batch.
The merger ("M Sorter") populates the SV4 by scanning the set of in-memory batches, searching for the one with the lowest value of the sort key. The batch number and offset are placed into the SV4. The process continues until all records from all batches have an entry in the SV4.
The actual implementation uses an iterative merge to perform the above efficiently.
A sort can only do a single merge. So, we do not attempt to share the generated class; we just generate it internally and discard it at completion of the merge.
The merge sorter only makes sense when we have at least one row. The caller must handle the special case of no rows.
-
Nested Class Summary
-
Field Summary
Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
LEFT_MAPPING, MAIN_MAPPING, RIGHT_MAPPING
Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
context
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
int
Container into which results are delivered.int
getSv2()
getSv4()
void
merge
(List<InputBatch> batchGroups, int outputBatchSize) Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.boolean
next()
The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.void
updateOutputContainer
(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema) Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
generateComparisons
Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
getInstance
-
Constructor Details
-
MergeSortWrapper
-
-
Method Details
-
merge
Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.- Parameters:
batchGroups
- the complete set of in-memory batchesoutputBatchSize
- output batch size for in-memory merge
-
next
public boolean next()The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.- Specified by:
next
in interfaceSortImpl.SortResults
-
close
public void close()- Specified by:
close
in interfaceSortImpl.SortResults
-
getBatchCount
public int getBatchCount()- Specified by:
getBatchCount
in interfaceSortImpl.SortResults
-
getRecordCount
public int getRecordCount()- Specified by:
getRecordCount
in interfaceSortImpl.SortResults
-
getSv4
- Specified by:
getSv4
in interfaceSortImpl.SortResults
-
updateOutputContainer
public void updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema) - Specified by:
updateOutputContainer
in interfaceSortImpl.SortResults
-
getSv2
- Specified by:
getSv2
in interfaceSortImpl.SortResults
-
getContainer
Description copied from interface:SortImpl.SortResults
Container into which results are delivered. May the the original operator container, or may be a different one. This is the container that should be sent downstream. This is a fixed value for all returned results.- Specified by:
getContainer
in interfaceSortImpl.SortResults
- Returns:
-