Interface TupleModel
- All Known Subinterfaces:
TupleModel.RowSetModel
- All Known Implementing Classes:
BaseTupleModel
The terminology used here:
- Row set
- A collection of rows stored as value vectors. Elsewhere in Drill we call this a "record batch", but that term has been overloaded to mean the runtime implementation of an operator.
- Tuple
- The relational-theory term for a row. Drill maps have a fixed schema. Impala, Hive and other tools use the term "structure" (or "struct") for what Drill calls a map. A structure is simply a nested tuple, modeled here by the same tuple abstraction used for rows.
- Column
- A column is represented by a vector (which may have internal null-flag or offset vectors.) Maps are a kind of column that has an associated tuple. Because this abstraction models structure, array columns are grouped with single values: the array-ness is just cardinality.
- Visitor
- The visitor abstraction (classic Gang-of-Four pattern) allows adding functionality without complicating the structure classes. Allows the same abstraction to be used for the testing RowSet abstractions and the scan operator "loader" classes.
- Metadata
- Metadata is simply data about data. Here, data about tuples and columns.
The column metadata mostly expands on that available in
MaterializedField
, but also adds allocation hints.
This abstraction is the physical dual of a VectorContainer
.
The vectors are "owned" by
the associated container. The structure here simply applies additional
metadata and visitor behavior to allow much easier processing that is
possible with the raw container structure.
A key value of this abstraction is the extended TupleSchema
associated with the structure. Unlike a
VectorContainer
, this abstraction keeps the schema in sync
with vectors as columns are added.
Some future version may wish to merge the two concepts. That way, metadata discovered by one operator will be available to another. Complex recursive functions can be replace by a visitor with the recursion handled inside implementations of this interface.
Tuples provide access to columns by both index and name. Both the schema and
model classes follow this convention. Compared with the VectorContainer and
AbstractMapVector
classes, the vector index is a first-class concept:
the column model and schema are guaranteed to reside at the same index relative
to the enclosing tuple. In addition, name access is efficient using a hash
index.
Visitor classes are defined by the "simple" (single batch) and "hyper" (multi-batch) implementations to allow vector implementations to work with the specifics of each type of batch.
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic interface
Common interface to access a column vector, its metadata, and its tuple definition (for maps.) Provides a visitor interface for common vector tasks.static interface
Tuple-model interface for the top-level row (tuple) structure. -
Method Summary
-
Method Details
-
schema
TupleMetadata schema() -
size
int size() -
column
-
column
-