Class OutputBatchBuilder

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder

public class OutputBatchBuilder extends Object
Builds an output batch based on an output schema and one or more input schemas. The input schemas must represent disjoint subsets of the output schema.

Handles maps, which can overlap at the map level (two inputs can hold a map column named `m`, say), but the map members must be disjoint. Applies the same rule recursively to nested maps.

Maps must be built with members in the same order as the corresponding schema. Though maps are usually thought of as unordered name/value pairs, they are actually tuples, with both a name and a defined ordering.

This code uses a name lookup in maps because the semantics of maps do not guarantee a uniform numbering of members from 0 to n-1, where {code n} is the number of map members. Map members are ordered, but the ordinal used by the map vector is not necessarily sequential.

Once the output container is built, the same value vectors reside in the input and output containers. This works because Drill requires vector persistence: the same vectors must be presented downstream in every batch until a schema change occurs.

Projection

To visualize projection, assume we have numbered table columns, lettered implicit, null or partition columns:

 [ 1 | 2 | 3 | 4 ]    Table columns in table order
 [ A | B | C ]        Static columns
 
Now, we wish to project them into select order. Let's say that the SELECT clause looked like this, with "t" indicating table columns:

 SELECT t2, t3, C, B, t1, A, t2 ...
 
Then the projection looks like this:

 [ 2 | 3 | C | B | 1 | A | 2 ]
 
Often, not all table columns are projected. In this case, the result set loader presents the full table schema to the reader, but actually writes only the projected columns. Suppose we have:

 SELECT t3, C, B, t1, A ...
 
Then the abbreviated table schema looks like this:

 [ 1 | 3 ]
Note that table columns retain their table ordering. The projection looks like this:

 [ 2 | C | B | 1 | A ]
 

The projector is created once per schema, then can be reused for any number of batches.

Merging is done in one of two ways, depending on the input source:

  • For the table loader, the merger discards any data in the output, then exchanges the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller.
  • For implicit and null columns, the output vector is identical to the input vector.
  • Constructor Details

  • Method Details

    • defineSourceBatchMapping

      protected void defineSourceBatchMapping(TupleMetadata schema, int source)
      Define the mapping for one of the sources. Mappings are stored in output order as a set of (source, offset) pairs.
    • getVector

      public ValueVector getVector(org.apache.drill.exec.physical.impl.scan.v3.lifecycle.OutputBatchBuilder.VectorSource source)
    • load

      public void load(int rowCount)
    • outputContainer

      public VectorContainer outputContainer()
    • close

      public void close()
      Release per-reader resources. Does not release the actual value vectors as those reside in a cache.