java.lang.Object
org.apache.drill.exec.physical.resultSet.impl.ContainerState
org.apache.drill.exec.physical.resultSet.impl.TupleState
All Implemented Interfaces:
AbstractTupleWriter.TupleWriterListener
Direct Known Subclasses:
TupleState.MapState, TupleState.RowState

public abstract class TupleState extends ContainerState implements AbstractTupleWriter.TupleWriterListener
Represents the loader state for a tuple: a row or a map. This is "state" in the sense of variables that are carried along with each tuple. Handles write-time issues such as defining new columns, allocating memory, handling overflow, assembling the output version of the map, and so on. Each row and map in the result set has a tuple state instances associated with it.

Here, by "tuple" we mean a container of vectors, each of which holds a variety of values. So, the "tuple" here is structural, not a specific set of values, but rather the collection of vectors that hold tuple values. Drill vector containers and maps are both tuples, but they irritatingly have completely different APIs for working with their child vectors. These classes are a proxy to wrap the two APIs to provide a common view for the use the result set builder and its internals.

Output Container

Builds the harvest vector container that includes only the columns that are included in the harvest schema version. That is, it excludes columns added while writing an overflow row.

Because a Drill row is actually a hierarchy, walks the internal hierarchy and builds a corresponding output hierarchy.

  • The root node is the row itself (vector container),
  • Internal nodes are maps (structures),
  • Leaf notes are primitive vectors (which may be arrays).
The basic algorithm is to identify the version of the output schema, then add any new columns added up to that version. This object maintains the output container across batches, meaning that updates are incremental: we need only add columns that are new since the last update. And, those new columns will always appear directly after all existing columns in the row or in a map.

As special case occurs when columns are added in the overflow row. These columns do not appear in the output container for the main part of the batch; instead they appear in the next output container that includes the overflow row.

Since the container here may contain a subset of the internal columns, an interesting case occurs for maps. The maps in the output container are not the same as those used internally. Since a map column can contain either one list of columns or another, the internal and external maps must differ. The set of child vectors (except for child maps) are shared.

  • Field Details

    • columns

      protected final List<ColumnState> columns
      The set of columns added via the writers: includes both projected and unprojected columns. (The writer is free to add columns that the query does not project; the result set loader creates a dummy column and dummy writer, then does not project the column to the output.)
    • schema

      protected final TupleMetadata schema
      Internal writer schema that matches the column list.
    • outputSchema

      protected TupleMetadata outputSchema
      Metadata description of the output container (for the row) or map (for map or repeated map.)

      Rows and maps have an output schema which may differ from the internal schema. The output schema excludes unprojected columns. It also excludes columns added in an overflow row.

      The output schema is built slightly differently for maps inside a union vs. normal top-level (or nested) maps. Maps inside a union do not defer columns because of the muddy semantics (and infrequent use) of unions.

  • Constructor Details

    • TupleState

      protected TupleState(org.apache.drill.exec.physical.resultSet.impl.LoaderInternals events, ResultVectorCache vectorCache, ProjectionFilter projectionSet)
  • Method Details