Class ResolvedTuple
- All Implemented Interfaces:
VectorSource
- Direct Known Subclasses:
ResolvedTuple.ResolvedDict
,ResolvedTuple.ResolvedMap
,ResolvedTuple.ResolvedRow
Output columns within the tuple can be projected from the data source, might be null (requested columns that don't match a data source column) or might be a constant (such as an implicit column.) This class orchestrates assembling an output tuple from a collection of these three column types. (Though implicit columns appear only in the root tuple.)
Null Handling
The project list might reference a "missing" map if the project list includes, say, SELECT a.b.c but `a` does not exist in the data source. In this case, the column a is implied to be a map, so the projection mechanism will create a null map for `a` and `b`, and will create a null column for `c`.To accomplish this recursive null processing, each tuple is associated with a null builder. (The null builder can be null if projection is implicit with a wildcard; in such a case no null columns can occur. But, even here, with schema persistence, a SELECT * query may need null columns if a second file does not contain a column that appeared in a first file.)
The null builder is bound to each tuple to allow vector persistence via the result vector cache. If we must create a null column `x` in two different readers, then the rules of Drill require that the same vector be used for both (or else a schema change is signaled.) The vector cache works by name (and type). Since maps may contain columns with the same names as other maps, the vector cache must be associated with each tuple. And, by extension, the null builder must also be associated with each tuple.
Lifecycle
The lifecycle of a resolved tuple is:- The projection mechanism creates the output tuple, and its columns, by comparing the project list against the table schema. The result is a set of table, null, or constant columns.
- Once per schema change, the resolved tuple creates the output tuple by linking to vectors in their original locations. As it turns out, we can simply share the vectors; we don't need to transfer the buffers.
- To prepare for the transfer, the tuple asks the null column builder (if present) to build the required null columns.
- Once the output tuple is built, it can be used for any number of batches without further work. (The same vectors appear in the various inputs and the output, eliminating the need for any transfers.)
- Once per batch, the client must set the row count. This is needed for the output container, and for any "null" maps that the project may have created.
Projection Mapping
Each column is is mapped into the output tuple (vector container or map) in the order that the columns are defined here. (That order follows the project list for explicit projection, or the table schema for implicit projection.) The source, however, may be in any order (at least for the table schema.) A projection mechanism identifies theVectorSource
that supplies the
vector for the column, along with the vector's index within that source.
The resolved tuple is bound to an output tuple. The projection mechanism
grabs the input vector from the vector source at the indicated position, and
links it into the output tuple, represented by this projected tuple, at the
position of the resolved column in the child list.
Caveats
The project mechanism handles nested "missing" columns as mentioned above. This works to create null columns within maps that are defined by the data source. However, the mechanism does not currently handle creating null columns within repeated maps or lists. Doing so is possible, but requires adding a level of cardinality computation to create the proper number of "inner" values.-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
static class
static class
Represents a map implied by the project list, whether or not the map actually appears in the table schema.static class
Represents a map tuple (not the map column, rather the value of the map column.) When projecting, we create a new repeated map vector, but share the offsets vector from input to output.static class
Represents the top-level tuple which is projected to a vector container.static class
static class
-
Field Summary
Modifier and TypeFieldDescriptionprotected VectorSource
protected List<ResolvedTuple>
protected final List<ResolvedColumn>
protected final NullColumnBuilder
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(ResolvedColumn col) void
addChild
(ResolvedTuple child) abstract void
addVector
(ValueVector vector) abstract BufferAllocator
protected void
Merge two or more partial batches to produce a final output batch.void
buildNulls
(ResultVectorCache vectorCache) protected void
cascadeRowCount
(int rowCount) void
close()
columns()
abstract int
innerCardinality
(int outerCardinality) boolean
void
loadNulls
(int rowCount) abstract String
name()
void
removeChild
(ResolvedTuple child) void
reset()
During planning, discard a partial plan to allow reusing the same (root) tuple for multiple projection plans.abstract void
setRowCount
(int rowCount) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.drill.exec.physical.impl.scan.project.VectorSource
vector
-
Field Details
-
members
-
nullBuilder
-
children
-
binding
-
-
Constructor Details
-
ResolvedTuple
-
-
Method Details
-
nullBuilder
-
add
-
addChild
-
removeChild
-
isSimpleProjection
public boolean isSimpleProjection() -
columns
-
buildNulls
-
loadNulls
public void loadNulls(int rowCount) -
innerCardinality
public abstract int innerCardinality(int outerCardinality) -
buildColumns
protected void buildColumns()Merge two or more partial batches to produce a final output batch. A partial batch is a vertical slice of a batch, such as the set of null columns or the set of data columns.For example, consider two partial batches:
We may wish to merge them by projecting columns into an output batch of the form:(a, d, e) (c, b)
It is not necessary to project all columns from the inputs, but all columns in the output must have a projection.(a, b, c, d)
The merger is created once per schema, then can be reused for any number of batches. The only restriction is that the partial batches must have the same row count so that the final output batch record count makes sense.
Merging is done by discarding any data in the output, then exchanging the buffers from the input columns to the output, leaving projected columns empty. Note that unprojected columns must be cleared by the caller. The caller will have figured out which columns to project and which not to project.
-
addVector
-
setRowCount
public abstract void setRowCount(int rowCount) -
cascadeRowCount
protected void cascadeRowCount(int rowCount) -
allocator
-
name
-
reset
public void reset()During planning, discard a partial plan to allow reusing the same (root) tuple for multiple projection plans. -
close
public void close()
-