org.apache.drill.exec.physical.resultSet.impl.ContainerState

org.apache.drill.exec.physical.resultSet.impl.TupleState

All Implemented Interfaces:: AbstractTupleWriter.TupleWriterListener

Direct Known Subclasses:: TupleState.MapState, TupleState.RowState

public abstract class TupleState extends ContainerState implements AbstractTupleWriter.TupleWriterListener

Represents the loader state for a tuple: a row or a map. This is "state" in the sense of variables that are carried along with each tuple. Handles write-time issues such as defining new columns, allocating memory, handling overflow, assembling the output version of the map, and so on. Each row and map in the result set has a tuple state instances associated with it.

Here, by "tuple" we mean a container of vectors, each of which holds a variety of values. So, the "tuple" here is structural, not a specific set of values, but rather the collection of vectors that hold tuple values. Drill vector containers and maps are both tuples, but they irritatingly have completely different APIs for working with their child vectors. These classes are a proxy to wrap the two APIs to provide a common view for the use the result set builder and its internals.

Output Container

Builds the harvest vector container that includes only the columns that are included in the harvest schema version. That is, it excludes columns added while writing an overflow row.

Because a Drill row is actually a hierarchy, walks the internal hierarchy and builds a corresponding output hierarchy.

The root node is the row itself (vector container),
Internal nodes are maps (structures),
Leaf notes are primitive vectors (which may be arrays).

The basic algorithm is to identify the version of the output schema, then add any new columns added up to that version. This object maintains the output container across batches, meaning that updates are incremental: we need only add columns that are new since the last update. And, those new columns will always appear directly after all existing columns in the row or in a map.

As special case occurs when columns are added in the overflow row. These columns do not appear in the output container for the main part of the batch; instead they appear in the next output container that includes the overflow row.

Since the container here may contain a subset of the internal columns, an interesting case occurs for maps. The maps in the output container are not the same as those used internally. Since a map column can contain either one list of columns or another, the internal and external maps must differ. The set of child vectors (except for child maps) are shared.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

TupleState.DictArrayState

static class

TupleState.DictArrayVectorState

static class

TupleState.DictColumnState

static class

TupleState.DictState

static class

TupleState.DictVectorState<T extends ValueVector>

static class

TupleState.MapArrayState

static class

TupleState.MapColumnState

Represents a map column (either single or repeated).

static class

TupleState.MapState

Represents a tuple defined as a Drill map: single or repeated.

static class

TupleState.MapVectorState

State for a map vector.

static class

TupleState.RowState

Handles the details of the top-level tuple, the data row itself.

static class

TupleState.SingleDictState

static class

TupleState.SingleDictVectorState

static class

TupleState.SingleMapState
Field Summary

Fields

Modifier and Type

Field

Description

protected final List<ColumnState>

columns

The set of columns added via the writers: includes both projected and unprojected columns.

protected TupleMetadata

outputSchema

Metadata description of the output container (for the row) or map (for map or repeated map.)

protected final TupleMetadata

schema

Internal writer schema that matches the column list.

Fields inherited from class org.apache.drill.exec.physical.resultSet.impl.ContainerState
loader, parentColumn, projectionSet, vectorCache
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

TupleState(org.apache.drill.exec.physical.resultSet.impl.LoaderInternals events, ResultVectorCache vectorCache, ProjectionFilter projectionSet)
Method Summary

Modifier and Type

Method

Description

protected void

addColumn(ColumnState colState)

ObjectWriter

addColumn(TupleWriter tupleWriter, MaterializedField column)

ObjectWriter

addColumn(TupleWriter tupleWriter, ColumnMetadata columnSchema)

abstract int

addOutputColumn(ValueVector vector, ColumnMetadata colSchema)

protected void

bindOutputSchema(TupleMetadata outputSchema)

List<ColumnState>

columns()

Returns an ordered set of the columns which make up the tuple.

protected Collection<ColumnState>

columnStates()

void

dump(HierarchicalFormatter format)

boolean

hasProjections()

boolean

isProjected(String colName)

TupleMetadata

outputSchema()

TupleMetadata

schema()

protected void

updateOutput(int curSchemaVersion)

abstract AbstractTupleWriter

writer()

Methods inherited from class org.apache.drill.exec.physical.resultSet.impl.ContainerState
addColumn, bindColumnState, close, harvestWithLookAhead, innerCardinality, isVersioned, loader, projection, rollover, startBatch, updateCardinality, vectorCache

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- columns
  
  protected final List<ColumnState> columns
  
  The set of columns added via the writers: includes both projected and unprojected columns. (The writer is free to add columns that the query does not project; the result set loader creates a dummy column and dummy writer, then does not project the column to the output.)
- schema
  
  protected final TupleMetadata schema
  
  Internal writer schema that matches the column list.
- outputSchema
  
  protected TupleMetadata outputSchema
  
  Metadata description of the output container (for the row) or map (for map or repeated map.)
  Rows and maps have an output schema which may differ from the internal schema. The output schema excludes unprojected columns. It also excludes columns added in an overflow row.
  The output schema is built slightly differently for maps inside a union vs. normal top-level (or nested) maps. Maps inside a union do not defer columns because of the muddy semantics (and infrequent use) of unions.
Constructor Details
- TupleState
  
  protected TupleState(org.apache.drill.exec.physical.resultSet.impl.LoaderInternals events, ResultVectorCache vectorCache, ProjectionFilter projectionSet)
Method Details
- bindOutputSchema
  
  protected void bindOutputSchema(TupleMetadata outputSchema)
- columns
  
  public List<ColumnState> columns()
  
  Returns an ordered set of the columns which make up the tuple. Column order is the same as that defined by the map's schema, to allow indexed access. New columns always appear at the end of the list to preserve indexes.
  
  Returns:
  
  ordered list of column states for the columns within this tuple
- schema
  
  public TupleMetadata schema()
- writer
  
  public abstract AbstractTupleWriter writer()
- isProjected
  
  public boolean isProjected(String colName)
  
  Specified by:
  
  isProjected in interface AbstractTupleWriter.TupleWriterListener
- addColumn
  
  public ObjectWriter addColumn(TupleWriter tupleWriter, MaterializedField column)
  
  Specified by:
  
  addColumn in interface AbstractTupleWriter.TupleWriterListener
- addColumn
  
  public ObjectWriter addColumn(TupleWriter tupleWriter, ColumnMetadata columnSchema)
  
  Specified by:
  
  addColumn in interface AbstractTupleWriter.TupleWriterListener
- addColumn
  
  protected void addColumn(ColumnState colState)
  
  Specified by:
  
  addColumn in class ContainerState
- hasProjections
  
  public boolean hasProjections()
- columnStates
  
  protected Collection<ColumnState> columnStates()
  
  Specified by:
  
  columnStates in class ContainerState
- updateOutput
  
  protected void updateOutput(int curSchemaVersion)
- addOutputColumn
  
  public abstract int addOutputColumn(ValueVector vector, ColumnMetadata colSchema)
- outputSchema
  
  public TupleMetadata outputSchema()
- dump
  
  public void dump(HierarchicalFormatter format)

Class TupleState

Output Container

Nested Class Summary

Field Summary

Fields inherited from class org.apache.drill.exec.physical.resultSet.impl.ContainerState

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.physical.resultSet.impl.ContainerState

Methods inherited from class java.lang.Object

Field Details

columns

schema

outputSchema

Constructor Details

TupleState

Method Details

bindOutputSchema

columns

schema

writer

isProjected

addColumn

addColumn

addColumn

hasProjections

columnStates

updateOutput

addOutputColumn

outputSchema

dump