Package org.apache.drill.exec.physical.resultSet.model
TupleMetadata
and ColumnMetadata
,
with allocation hints that goes beyond the MaterializedField
used by value vectors.
In an ideal world, this structure would not be necessary; the vectors could, by themselves, provide the needed structure. However, vectors are used in many places, in many ways, and are hard to evolve. Further, Drill may eventually choose to move to Arrow, which would not have the structure provided here.
A set of visitor classes provide the logic to traverse the vector structure, avoiding the need for multiple implementations of vector traversal. (Traversal is needed because maps contain vectors, some of which can be maps, resulting in a tree structure. Further, the API provided by containers (a top-level tuple) differs from that of a map vector (nested tuple.) This structure provides a uniform API for both cases.
Three primary tasks provided by this structure are:
- Create writers for a set of vectors. Allow incremental write-time addition of columns, keeping the vectors, columns and metadata all in sync.
- Create readers for a set of vectors. Vectors are immutable once written, so the reader mechanism does not provide any dynamic schema change support.
- Allocate vectors based on metadata provided. Allocation metadata includes estimated widths for variable-width columns and estimated cardinality for array columns.
Drill supports two kinds of batches, reflected by two implementations of the structure:
- Single batch
- Represents a single batch in which each column is backed by a single value vector. Single batches support both reading and writing. Writing can be done only for "new" batches; reading can be done only after writing is complete. Modeled by the {#link org.apache.drill.exec.physical.rowSet.model.single single} package.
- Hyper batch
- Represents a stacked set of batches in which each column is backed
by a list of columns. A hyper batch is indexed by an "sv4" (four-byte
selection vector.) A hyper batch allows only reading. Modeled by the
hyper
package.
-
ClassDescriptionBase implementation for a tuple model which is common to the "single" and "hyper" cases.ContainerVisitor<R,
A> Interface for retrieving and/or creating metadata given a vector.Row set index base class used when indexing rows within a row set for a row set reader.Common interface to access a tuple backed by a vector container or a map vector.Common interface to access a column vector, its metadata, and its tuple definition (for maps.) Provides a visitor interface for common vector tasks.Tuple-model interface for the top-level row (tuple) structure.