Package org.apache.drill.exec.physical.resultSet.model


package org.apache.drill.exec.physical.resultSet.model
The "row set model" provides a "dual" of the vector structure used to create, allocate and work with a collection of vectors. The model provides an enhanced "metadata" schema, given by TupleMetadata and ColumnMetadata, with allocation hints that goes beyond the MaterializedField used by value vectors.

In an ideal world, this structure would not be necessary; the vectors could, by themselves, provide the needed structure. However, vectors are used in many places, in many ways, and are hard to evolve. Further, Drill may eventually choose to move to Arrow, which would not have the structure provided here.

A set of visitor classes provide the logic to traverse the vector structure, avoiding the need for multiple implementations of vector traversal. (Traversal is needed because maps contain vectors, some of which can be maps, resulting in a tree structure. Further, the API provided by containers (a top-level tuple) differs from that of a map vector (nested tuple.) This structure provides a uniform API for both cases.

Three primary tasks provided by this structure are:

  1. Create writers for a set of vectors. Allow incremental write-time addition of columns, keeping the vectors, columns and metadata all in sync.
  2. Create readers for a set of vectors. Vectors are immutable once written, so the reader mechanism does not provide any dynamic schema change support.
  3. Allocate vectors based on metadata provided. Allocation metadata includes estimated widths for variable-width columns and estimated cardinality for array columns.

Drill supports two kinds of batches, reflected by two implementations of the structure:

Single batch
Represents a single batch in which each column is backed by a single value vector. Single batches support both reading and writing. Writing can be done only for "new" batches; reading can be done only after writing is complete. Modeled by the {#link org.apache.drill.exec.physical.rowSet.model.single single} package.
Hyper batch
Represents a stacked set of batches in which each column is backed by a list of columns. A hyper batch is indexed by an "sv4" (four-byte selection vector.) A hyper batch allows only reading. Modeled by the hyper package.