org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder

public class OutputBatchBuilder extends Object

Builds an output batch based on an output schema and one or more input schemas. The input schemas must represent disjoint subsets of the output schema.

Handles maps, which can overlap at the map level (two inputs can hold a map column named `m`, say), but the map members must be disjoint. Applies the same rule recursively to nested maps.

Maps must be built with members in the same order as the corresponding schema. Though maps are usually thought of as unordered name/value pairs, they are actually tuples, with both a name and a defined ordering.

This code uses a name lookup in maps because the semantics of maps do not guarantee a uniform numbering of members from 0 to n-1, where {code n} is the number of map members. Map members are ordered, but the ordinal used by the map vector is not necessarily sequential.

Once the output container is built, the same value vectors reside in the input and output containers. This works because Drill requires vector persistence: the same vectors must be presented downstream in every batch until a schema change occurs.

Projection

To visualize projection, assume we have numbered table columns, lettered implicit, null or partition columns:


 [ 1 | 2 | 3 | 4 ]    Table columns in table order
 [ A | B | C ]        Static columns

Now, we wish to project them into select order. Let's say that the SELECT clause looked like this, with "t" indicating table columns:


 SELECT t2, t3, C, B, t1, A, t2 ...

Then the projection looks like this:


 [ 2 | 3 | C | B | 1 | A | 2 ]

Often, not all table columns are projected. In this case, the result set loader presents the full table schema to the reader, but actually writes only the projected columns. Suppose we have:


 SELECT t3, C, B, t1, A ...

Then the abbreviated table schema looks like this:


 [ 1 | 3 ]

Note that table columns retain their table ordering. The projection looks like this:


 [ 2 | C | B | 1 | A ]

The projector is created once per schema, then can be reused for any number of batches.

Merging is done in one of two ways, depending on the input source:

For the table loader, the merger discards any data in the output, then exchanges the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller.
For implicit and null columns, the output vector is identical to the input vector.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

OutputBatchBuilder.BatchSource

Describes an input batch with a schema and a vector container.

static class

OutputBatchBuilder.MapSource

Source map as a map schema and map vector.
Constructor Summary

Constructors

Constructor

Description

OutputBatchBuilder(TupleMetadata outputSchema, List<OutputBatchBuilder.BatchSource> sources, BufferAllocator allocator)
Method Summary

Modifier and Type

Method

Description

void

close()

Release per-reader resources.

protected void

defineSourceBatchMapping(TupleMetadata schema, int source)

Define the mapping for one of the sources.

ValueVector

getVector(org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.VectorSource source)

void

load(int rowCount)

VectorContainer

outputContainer()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- OutputBatchBuilder
  
  public OutputBatchBuilder(TupleMetadata outputSchema, List<OutputBatchBuilder.BatchSource> sources, BufferAllocator allocator)
Method Details
- defineSourceBatchMapping
  
  protected void defineSourceBatchMapping(TupleMetadata schema, int source)
  
  Define the mapping for one of the sources. Mappings are stored in output order as a set of (source, offset) pairs.
- getVector
  
  public ValueVector getVector(org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.VectorSource source)
- load
  
  public void load(int rowCount)
- outputContainer
  
  public VectorContainer outputContainer()
- close
  
  public void close()
  
  Release per-reader resources. Does not release the actual value vectors as those reside in a cache.

Class OutputBatchBuilder

Projection

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

OutputBatchBuilder

Method Details

defineSourceBatchMapping

getVector

load

outputContainer

close