org.apache.drill.exec.physical.impl.xsort.MergeSortWrapper

All Implemented Interfaces:: SortImpl.SortResults

public class MergeSortWrapper extends BaseSortWrapper implements SortImpl.SortResults

Wrapper around the "MSorter" (in memory merge sorter). As batches have arrived to the sort, they have been individually sorted and buffered in memory. At the completion of the sort, we detect that no batches were spilled to disk. In this case, we can merge the in-memory batches using an efficient memory-based approach implemented here.

Since all batches are in memory, we don't want to use the usual merge algorithm as that makes a copy of the original batches (which were read from a spill file) to produce an output batch. Instead, we want to use the in-memory batches as-is. To do this, we use a selection vector 4 (SV4) as a global index into the collection of batches. The SV4 uses the upper two bytes as the batch index, and the lower two as an offset of a record within the batch.

The merger ("M Sorter") populates the SV4 by scanning the set of in-memory batches, searching for the one with the lowest value of the sort key. The batch number and offset are placed into the SV4. The process continues until all records from all batches have an entry in the SV4.

The actual implementation uses an iterative merge to perform the above efficiently.

A sort can only do a single merge. So, we do not attempt to share the generated class; we just generate it internally and discard it at completion of the merge.

The merge sorter only makes sense when we have at least one row. The caller must handle the special case of no rows.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static enum

MergeSortWrapper.State
Field Summary

Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
LEFT_MAPPING, MAIN_MAPPING, RIGHT_MAPPING

Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
context
Constructor Summary

Constructors

Constructor

Description

MergeSortWrapper(OperatorContext opContext, VectorContainer destContainer)
Method Summary

Modifier and Type

Method

Description

void

close()

int

getBatchCount()

VectorContainer

getContainer()

Container into which results are delivered.

int

getRecordCount()

SelectionVector2

getSv2()

SelectionVector4

getSv4()

void

merge(List<InputBatch> batchGroups, int outputBatchSize)

Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.

boolean

next()

The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.

void

updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema)

Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper
generateComparisons

Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper
getInstance

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- MergeSortWrapper
  
  public MergeSortWrapper(OperatorContext opContext, VectorContainer destContainer)
Method Details
- merge
  
  public void merge(List<InputBatch> batchGroups, int outputBatchSize)
  
  Merge the set of in-memory batches to produce a single logical output in the given destination container, indexed by an SV4.
  
  Parameters:
  
  batchGroups - the complete set of in-memory batches
  
  outputBatchSize - output batch size for in-memory merge
- next
  
  public boolean next()
  
  The SV4 provides a built-in iterator that returns a virtual set of record batches so that the downstream operator need not consume the entire set of accumulated batches in a single step.
  
  Specified by:
  
  next in interface SortImpl.SortResults
- close
  
  public void close()
  
  Specified by:
  
  close in interface SortImpl.SortResults
- getBatchCount
  
  public int getBatchCount()
  
  Specified by:
  
  getBatchCount in interface SortImpl.SortResults
- getRecordCount
  
  public int getRecordCount()
  
  Specified by:
  
  getRecordCount in interface SortImpl.SortResults
- getSv4
  
  public SelectionVector4 getSv4()
  
  Specified by:
  
  getSv4 in interface SortImpl.SortResults
- updateOutputContainer
  
  public void updateOutputContainer(VectorContainer container, SelectionVector4 sv4, RecordBatch.IterOutcome outcome, BatchSchema schema)
  
  Specified by:
  
  updateOutputContainer in interface SortImpl.SortResults
- getSv2
  
  public SelectionVector2 getSv2()
  
  Specified by:
  
  getSv2 in interface SortImpl.SortResults
- getContainer
  
  public VectorContainer getContainer()
  
  Description copied from interface: SortImpl.SortResults
  
  Container into which results are delivered. May the the original operator container, or may be a different one. This is the container that should be sent downstream. This is a fixed value for all returned results.
  
  Specified by:
  
  getContainer in interface SortImpl.SortResults
  
  Returns:

Class MergeSortWrapper

Nested Class Summary

Field Summary

Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper

Fields inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseSortWrapper

Methods inherited from class org.apache.drill.exec.physical.impl.xsort.BaseWrapper

Methods inherited from class java.lang.Object

Constructor Details

MergeSortWrapper

Method Details

merge

next

close

getBatchCount

getRecordCount

getSv4

updateOutputContainer

getSv2

getContainer