java.lang.Object
org.apache.drill.exec.physical.impl.scan.project.ResolvedTuple
All Implemented Interfaces:
VectorSource
Direct Known Subclasses:
ResolvedTuple.ResolvedDict, ResolvedTuple.ResolvedMap, ResolvedTuple.ResolvedRow

public abstract class ResolvedTuple extends Object implements VectorSource
Drill rows are made up of a tree of tuples, with the row being the root tuple. Each tuple contains columns, some of which may be maps. This class represents each row or map in the output projection.

Output columns within the tuple can be projected from the data source, might be null (requested columns that don't match a data source column) or might be a constant (such as an implicit column.) This class orchestrates assembling an output tuple from a collection of these three column types. (Though implicit columns appear only in the root tuple.)

Null Handling

The project list might reference a "missing" map if the project list includes, say, SELECT a.b.c but `a` does not exist in the data source. In this case, the column a is implied to be a map, so the projection mechanism will create a null map for `a` and `b`, and will create a null column for `c`.

To accomplish this recursive null processing, each tuple is associated with a null builder. (The null builder can be null if projection is implicit with a wildcard; in such a case no null columns can occur. But, even here, with schema persistence, a SELECT * query may need null columns if a second file does not contain a column that appeared in a first file.)

The null builder is bound to each tuple to allow vector persistence via the result vector cache. If we must create a null column `x` in two different readers, then the rules of Drill require that the same vector be used for both (or else a schema change is signaled.) The vector cache works by name (and type). Since maps may contain columns with the same names as other maps, the vector cache must be associated with each tuple. And, by extension, the null builder must also be associated with each tuple.

Lifecycle

The lifecycle of a resolved tuple is:
  • The projection mechanism creates the output tuple, and its columns, by comparing the project list against the table schema. The result is a set of table, null, or constant columns.
  • Once per schema change, the resolved tuple creates the output tuple by linking to vectors in their original locations. As it turns out, we can simply share the vectors; we don't need to transfer the buffers.
  • To prepare for the transfer, the tuple asks the null column builder (if present) to build the required null columns.
  • Once the output tuple is built, it can be used for any number of batches without further work. (The same vectors appear in the various inputs and the output, eliminating the need for any transfers.)
  • Once per batch, the client must set the row count. This is needed for the output container, and for any "null" maps that the project may have created.

Projection Mapping

Each column is is mapped into the output tuple (vector container or map) in the order that the columns are defined here. (That order follows the project list for explicit projection, or the table schema for implicit projection.) The source, however, may be in any order (at least for the table schema.) A projection mechanism identifies the VectorSource that supplies the vector for the column, along with the vector's index within that source. The resolved tuple is bound to an output tuple. The projection mechanism grabs the input vector from the vector source at the indicated position, and links it into the output tuple, represented by this projected tuple, at the position of the resolved column in the child list.

Caveats

The project mechanism handles nested "missing" columns as mentioned above. This works to create null columns within maps that are defined by the data source. However, the mechanism does not currently handle creating null columns within repeated maps or lists. Doing so is possible, but requires adding a level of cardinality computation to create the proper number of "inner" values.
  • Field Details

  • Constructor Details

  • Method Details

    • nullBuilder

      public NullColumnBuilder nullBuilder()
    • add

      public void add(ResolvedColumn col)
    • addChild

      public void addChild(ResolvedTuple child)
    • removeChild

      public void removeChild(ResolvedTuple child)
    • isSimpleProjection

      public boolean isSimpleProjection()
    • columns

      public List<ResolvedColumn> columns()
    • buildNulls

      public void buildNulls(ResultVectorCache vectorCache)
    • loadNulls

      public void loadNulls(int rowCount)
    • innerCardinality

      public abstract int innerCardinality(int outerCardinality)
    • buildColumns

      protected void buildColumns()
      Merge two or more partial batches to produce a final output batch. A partial batch is a vertical slice of a batch, such as the set of null columns or the set of data columns.

      For example, consider two partial batches:

      
       (a, d, e)
       (c, b)
      We may wish to merge them by projecting columns into an output batch of the form:
      
       (a, b, c, d)
      It is not necessary to project all columns from the inputs, but all columns in the output must have a projection.

      The merger is created once per schema, then can be reused for any number of batches. The only restriction is that the partial batches must have the same row count so that the final output batch record count makes sense.

      Merging is done by discarding any data in the output, then exchanging the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller. The caller will have figured out which columns to project and which not to project.

    • addVector

      public abstract void addVector(ValueVector vector)
    • setRowCount

      public abstract void setRowCount(int rowCount)
    • cascadeRowCount

      protected void cascadeRowCount(int rowCount)
    • allocator

      public abstract BufferAllocator allocator()
    • name

      public abstract String name()
    • reset

      public void reset()
      During planning, discard a partial plan to allow reusing the same (root) tuple for multiple projection plans.
    • close

      public void close()