Interface TupleModel

All Known Subinterfaces:
TupleModel.RowSetModel
All Known Implementing Classes:
BaseTupleModel

public interface TupleModel
Common interface to access a tuple backed by a vector container or a map vector. Provides a visitor interface to apply tasks such as vector allocation, reader or writer creation, and so on. Allows either static or dynamic vector allocation.

The terminology used here:

Row set
A collection of rows stored as value vectors. Elsewhere in Drill we call this a "record batch", but that term has been overloaded to mean the runtime implementation of an operator.
Tuple
The relational-theory term for a row. Drill maps have a fixed schema. Impala, Hive and other tools use the term "structure" (or "struct") for what Drill calls a map. A structure is simply a nested tuple, modeled here by the same tuple abstraction used for rows.
Column
A column is represented by a vector (which may have internal null-flag or offset vectors.) Maps are a kind of column that has an associated tuple. Because this abstraction models structure, array columns are grouped with single values: the array-ness is just cardinality.
Visitor
The visitor abstraction (classic Gang-of-Four pattern) allows adding functionality without complicating the structure classes. Allows the same abstraction to be used for the testing RowSet abstractions and the scan operator "loader" classes.
Metadata
Metadata is simply data about data. Here, data about tuples and columns. The column metadata mostly expands on that available in MaterializedField, but also adds allocation hints.

This abstraction is the physical dual of a VectorContainer. The vectors are "owned" by the associated container. The structure here simply applies additional metadata and visitor behavior to allow much easier processing that is possible with the raw container structure.

A key value of this abstraction is the extended TupleSchema associated with the structure. Unlike a VectorContainer, this abstraction keeps the schema in sync with vectors as columns are added.

Some future version may wish to merge the two concepts. That way, metadata discovered by one operator will be available to another. Complex recursive functions can be replace by a visitor with the recursion handled inside implementations of this interface.

Tuples provide access to columns by both index and name. Both the schema and model classes follow this convention. Compared with the VectorContainer and AbstractMapVector classes, the vector index is a first-class concept: the column model and schema are guaranteed to reside at the same index relative to the enclosing tuple. In addition, name access is efficient using a hash index.

Visitor classes are defined by the "simple" (single batch) and "hyper" (multi-batch) implementations to allow vector implementations to work with the specifics of each type of batch.