org.apache.drill.exec.physical.resultSet.model (Drill : 1.21.2-SNAPSHOT API)

package org.apache.drill.exec.physical.resultSet.model

The "row set model" provides a "dual" of the vector structure used to create, allocate and work with a collection of vectors. The model provides an enhanced "metadata" schema, given by TupleMetadata and ColumnMetadata, with allocation hints that goes beyond the MaterializedField used by value vectors.

In an ideal world, this structure would not be necessary; the vectors could, by themselves, provide the needed structure. However, vectors are used in many places, in many ways, and are hard to evolve. Further, Drill may eventually choose to move to Arrow, which would not have the structure provided here.

A set of visitor classes provide the logic to traverse the vector structure, avoiding the need for multiple implementations of vector traversal. (Traversal is needed because maps contain vectors, some of which can be maps, resulting in a tree structure. Further, the API provided by containers (a top-level tuple) differs from that of a map vector (nested tuple.) This structure provides a uniform API for both cases.

Three primary tasks provided by this structure are:

Create writers for a set of vectors. Allow incremental write-time addition of columns, keeping the vectors, columns and metadata all in sync.
Create readers for a set of vectors. Vectors are immutable once written, so the reader mechanism does not provide any dynamic schema change support.
Allocate vectors based on metadata provided. Allocation metadata includes estimated widths for variable-width columns and estimated cardinality for array columns.

Drill supports two kinds of batches, reflected by two implementations of the structure:

Single batch: Represents a single batch in which each column is backed by a single value vector. Single batches support both reading and writing. Writing can be done only for "new" batches; reading can be done only after writing is complete. Modeled by the {#link org.apache.drill.exec.physical.rowSet.model.single single} package.
Hyper batch: Represents a stacked set of batches in which each column is backed by a list of columns. A hyper batch is indexed by an "sv4" (four-byte selection vector.) A hyper batch allows only reading. Modeled by the hyper package.

Related Packages

Package

Description

org.apache.drill.exec.physical.resultSet

Provides a second-generation row set (AKA "record batch") writer used by client code to Define the schema of a result set. Write data into the vectors backing a row set.

org.apache.drill.exec.physical.resultSet.model.hyper

Implementation of a row set model for hyper-batches.

org.apache.drill.exec.physical.resultSet.model.single

This set of classes models the structure of a batch consisting of single vectors (as contrasted with a hyper batch.) Provides tools or metdata-based construction, allocation, reading and writing of the vectors.

org.apache.drill.exec.physical.resultSet.impl

Handles the details of the result set loader implementation.

org.apache.drill.exec.physical.resultSet.project

org.apache.drill.exec.physical.resultSet.util
Class

Description

BaseTupleModel

Base implementation for a tuple model which is common to the "single" and "hyper" cases.

BaseTupleModel.BaseColumnModel

ContainerVisitor<R,A>

MetadataProvider

Interface for retrieving and/or creating metadata given a vector.

MetadataProvider.ArraySchemaCreator

MetadataProvider.ArraySchemaRetrieval

MetadataProvider.MetadataCreator

MetadataProvider.MetadataRetrieval

MetadataProvider.VariantSchemaCreator

MetadataProvider.VariantSchemaRetrieval

MetadataProvider.VectorDescrip

ReaderBuilder

ReaderIndex

Row set index base class used when indexing rows within a row set for a row set reader.

TupleModel

Common interface to access a tuple backed by a vector container or a map vector.

TupleModel.ColumnModel

Common interface to access a column vector, its metadata, and its tuple definition (for maps.) Provides a visitor interface for common vector tasks.

TupleModel.RowSetModel

Tuple-model interface for the top-level row (tuple) structure.

Package org.apache.drill.exec.physical.resultSet.model