org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple

All Implemented Interfaces:: VectorSource

Direct Known Subclasses:: ResolvedTuple.ResolvedDict, ResolvedTuple.ResolvedMap, ResolvedTuple.ResolvedRow

public abstract class ResolvedTuple extends Object implements VectorSource

Drill rows are made up of a tree of tuples, with the row being the root tuple. Each tuple contains columns, some of which may be maps. This class represents each row or map in the output projection.

Output columns within the tuple can be projected from the data source, might be null (requested columns that don't match a data source column) or might be a constant (such as an implicit column.) This class orchestrates assembling an output tuple from a collection of these three column types. (Though implicit columns appear only in the root tuple.)

Null Handling

The project list might reference a "missing" map if the project list includes, say, SELECT a.b.c but `a` does not exist in the data source. In this case, the column a is implied to be a map, so the projection mechanism will create a null map for `a` and `b`, and will create a null column for `c`.

To accomplish this recursive null processing, each tuple is associated with a null builder. (The null builder can be null if projection is implicit with a wildcard; in such a case no null columns can occur. But, even here, with schema persistence, a SELECT * query may need null columns if a second file does not contain a column that appeared in a first file.)

The null builder is bound to each tuple to allow vector persistence via the result vector cache. If we must create a null column `x` in two different readers, then the rules of Drill require that the same vector be used for both (or else a schema change is signaled.) The vector cache works by name (and type). Since maps may contain columns with the same names as other maps, the vector cache must be associated with each tuple. And, by extension, the null builder must also be associated with each tuple.

Lifecycle

The lifecycle of a resolved tuple is:

The projection mechanism creates the output tuple, and its columns, by comparing the project list against the table schema. The result is a set of table, null, or constant columns.
Once per schema change, the resolved tuple creates the output tuple by linking to vectors in their original locations. As it turns out, we can simply share the vectors; we don't need to transfer the buffers.
To prepare for the transfer, the tuple asks the null column builder (if present) to build the required null columns.
Once the output tuple is built, it can be used for any number of batches without further work. (The same vectors appear in the various inputs and the output, eliminating the need for any transfers.)
Once per batch, the client must set the row count. This is needed for the output container, and for any "null" maps that the project may have created.

Projection Mapping

Each column is is mapped into the output tuple (vector container or map) in the order that the columns are defined here. (That order follows the project list for explicit projection, or the table schema for implicit projection.) The source, however, may be in any order (at least for the table schema.) A projection mechanism identifies the VectorSource that supplies the vector for the column, along with the vector's index within that source. The resolved tuple is bound to an output tuple. The projection mechanism grabs the input vector from the vector source at the indicated position, and links it into the output tuple, represented by this projected tuple, at the position of the resolved column in the child list.

Caveats

The project mechanism handles nested "missing" columns as mentioned above. This works to create null columns within maps that are defined by the data source. However, the mechanism does not currently handle creating null columns within repeated maps or lists. Doing so is possible, but requires adding a level of cardinality computation to create the proper number of "inner" values.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ResolvedTuple.ResolvedDict

static class

ResolvedTuple.ResolvedDictArray

static class

ResolvedTuple.ResolvedMap

Represents a map implied by the project list, whether or not the map actually appears in the table schema.

static class

ResolvedTuple.ResolvedMapArray

Represents a map tuple (not the map column, rather the value of the map column.) When projecting, we create a new repeated map vector, but share the offsets vector from input to output.

static class

ResolvedTuple.ResolvedRow

Represents the top-level tuple which is projected to a vector container.

static class

ResolvedTuple.ResolvedSingleDict

static class

ResolvedTuple.ResolvedSingleMap
Field Summary

Fields

Modifier and Type

Field

Description

protected VectorSource

binding

protected List<ResolvedTuple>

children

protected final List<ResolvedColumn>

members

protected final NullColumnBuilder

nullBuilder
Constructor Summary

Constructors

Constructor

Description

ResolvedTuple(NullColumnBuilder nullBuilder)
Method Summary

Modifier and Type

Method

Description

void

add(ResolvedColumn col)

void

addChild(ResolvedTuple child)

abstract void

addVector(ValueVector vector)

abstract BufferAllocator

allocator()

protected void

buildColumns()

Merge two or more partial batches to produce a final output batch.

void

buildNulls(ResultVectorCache vectorCache)

protected void

cascadeRowCount(int rowCount)

void

close()

List<ResolvedColumn>

columns()

abstract int

innerCardinality(int outerCardinality)

boolean

isSimpleProjection()

void

loadNulls(int rowCount)

abstract String

name()

NullColumnBuilder

nullBuilder()

void

removeChild(ResolvedTuple child)

void

reset()

During planning, discard a partial plan to allow reusing the same (root) tuple for multiple projection plans.

abstract void

setRowCount(int rowCount)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.drill.exec.physical.impl.scan.project.VectorSource
vector

Field Details
- members
  
  protected final List<ResolvedColumn> members
- nullBuilder
  
  protected final NullColumnBuilder nullBuilder
- children
  
  protected List<ResolvedTuple> children
- binding
  
  protected VectorSource binding
Constructor Details
- ResolvedTuple
  
  public ResolvedTuple(NullColumnBuilder nullBuilder)
Method Details
- nullBuilder
  
  public NullColumnBuilder nullBuilder()
- add
  
  public void add(ResolvedColumn col)
- addChild
  
  public void addChild(ResolvedTuple child)
- removeChild
  
  public void removeChild(ResolvedTuple child)
- isSimpleProjection
  
  public boolean isSimpleProjection()
- columns
  
  public List<ResolvedColumn> columns()
- buildNulls
  
  public void buildNulls(ResultVectorCache vectorCache)
- loadNulls
  
  public void loadNulls(int rowCount)
- innerCardinality
  
  public abstract int innerCardinality(int outerCardinality)
- buildColumns
  
  protected void buildColumns()
  Merge two or more partial batches to produce a final output batch. A partial batch is a vertical slice of a batch, such as the set of null columns or the set of data columns.
  For example, consider two partial batches:
  (a, d, e) (c, b)
  We may wish to merge them by projecting columns into an output batch of the form:
  (a, b, c, d)
  It is not necessary to project all columns from the inputs, but all columns in the output must have a projection.
  The merger is created once per schema, then can be reused for any number of batches. The only restriction is that the partial batches must have the same row count so that the final output batch record count makes sense.
  Merging is done by discarding any data in the output, then exchanging the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller. The caller will have figured out which columns to project and which not to project.
- addVector
  
  public abstract void addVector(ValueVector vector)
- setRowCount
  
  public abstract void setRowCount(int rowCount)
- cascadeRowCount
  
  protected void cascadeRowCount(int rowCount)
- allocator
  
  public abstract BufferAllocator allocator()
- name
  
  public abstract String name()
- reset
  
  public void reset()
  
  During planning, discard a partial plan to allow reusing the same (root) tuple for multiple projection plans.
- close
  
  public void close()

Class ResolvedTuple

Null Handling

Lifecycle

Projection Mapping

Caveats

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.drill.exec.physical.impl.scan.project.VectorSource

Field Details

members

nullBuilder

children

binding

Constructor Details

ResolvedTuple

Method Details

nullBuilder

add

addChild

removeChild

isSimpleProjection

columns

buildNulls

loadNulls

innerCardinality

buildColumns

addVector

setRowCount

cascadeRowCount

allocator

name

reset

close