See: Description
Interface  Description 

ProjectionFilter 
Provides a variety of ways to filter columns: no filtering, filter
by (parsed) projection list, or filter by projection list and
provided schema.

PullResultSetReaderImpl.UpstreamSource  
PushResultSetReaderImpl.UpstreamSource  
VectorState 
Handles batch and overflow operation for a (possibly compound) vector.

Class  Description 

BuildFromSchema 
Build the set of writers from a defined schema.

ColumnBuilder 
Algorithms for building a column given a metadata description of the column and
the parent context that will hold the column.

ColumnState 
Represents the writetime state for a column including the writer and the (optional)
backing vector.

ColumnState.BaseContainerColumnState  
ColumnState.PrimitiveColumnState 
Primitive (nonmap) column state.

ContainerState 
Abstract representation of a container of vectors: a row, a map, a
repeated map, a list or a union.

ListState 
Represents the contents of a list vector.

ListState.ListVectorState 
Wrapper around the list vector (and its optional contained union).

NullableVectorState  
NullResultVectorCacheImpl 
A vector cache implementation which does not actually cache.

NullVectorState 
Donothing vector state for a map column which has no actual vector
associated with it.

NullVectorState.UnmanagedVectorState 
Neardonothing state for a vector that requires no work to
allocate or rollover, but where we do want to at least track
the vector itself.

ProjectionFilter.BaseSchemaProjectionFilter 
Schemabased projection.

ProjectionFilter.CompoundProjectionFilter 
Compound filter for combining direct and provided schema projections.

ProjectionFilter.DirectProjectionFilter 
Projection filter based on the (parsed) projection list.

ProjectionFilter.ImplicitProjectionFilter 
Implied projection: either project all or project none.

ProjectionFilter.ProjResult  
ProjectionFilter.SchemaProjectionFilter 
Projection filter in which a schema exactly defines the set of allowed
columns, and their types.

ProjectionFilter.TypeProjectionFilter 
Projection based on a nonstrict provided schema which enforces the type of known
columns, but has no opinion about additional columns.

PullResultSetReaderImpl 
Protocol

PushResultSetReaderImpl  
PushResultSetReaderImpl.BatchHolder  
RepeatedListState 
Represents the internal state of a RepeatedList vector.

RepeatedListState.RepeatedListColumnState 
Repeated list column state.

RepeatedListState.RepeatedListVectorState 
Track the repeated list vector.

RepeatedVectorState 
Vector state for a scalar array (repeated scalar) vector.

ResultSetCopierImpl  
ResultSetLoaderImpl 
Implementation of the result set loader.

ResultSetLoaderImpl.ResultSetOptions 
Readonly set of options for the result set loader.

ResultSetOptionBuilder 
Builder for the options for the row set loader.

ResultVectorCacheImpl 
Manages an inventory of value vectors used across row batch readers.

RowSetLoaderImpl 
Implementation of the row set loader.

SingleVectorState 
Base class for a single vector.

SingleVectorState.FixedWidthVectorState 
State for a scalar value vector.

SingleVectorState.IsSetVectorState  
SingleVectorState.OffsetVectorState 
Special case for an offset vector.

SingleVectorState.SimpleVectorState  
SingleVectorState.VariableWidthVectorState 
State for a scalar value vector.

TupleState 
Represents the loader state for a tuple: a row or a map.

TupleState.DictArrayState  
TupleState.DictArrayVectorState  
TupleState.DictColumnState  
TupleState.DictState  
TupleState.DictVectorState<T extends ValueVector>  
TupleState.MapArrayState  
TupleState.MapColumnState 
Represents a map column (either single or repeated).

TupleState.MapState 
Represents a tuple defined as a Drill map: single or repeated.

TupleState.MapVectorState 
State for a map vector.

TupleState.RowState 
Handles the details of the toplevel tuple, the data row itself.

TupleState.SingleDictState  
TupleState.SingleDictVectorState  
TupleState.SingleMapState  
UnionState 
Represents the contents of a union vector (or a pseudounion for lists).

UnionState.UnionColumnState 
Union or list (repeated union) column state.

UnionState.UnionVectorState 
Vector wrapper for a union vector.

Enum  Description 

ColumnState.State 
Columns move through various lifecycle states as identified by this
enum.

PullResultSetReaderImpl.State 
The primary purpose of this loader, and the most complex to understand and maintain, is overflow handling.
Row  a  b  c  d  e  f  g  h 

n2  X  X  X  X  X  X     
n1  X  X  X  X      
n  X  !  O  O  O 
The scenarios, identified by column names above, are:
At the time of overflow on row n:
As the overflow write proceeds:
At harvest time:
When starting the next batch:
Arrays are a different matter: each row can have many values associated with it. Consider an array of scalars. We have:
Row 0 Row 1 Row 2
0 1 2 3 4 5 6 7 8
[ [a b c] [d e f]  [g h i] ]
Here, the letters indicate values. The brackets show the overall vector
(outer brackets) and individual rows (inner brackets). The vertical line
shows where overflow occurred. The same rules as discussed earier still
apply, but we must consider both the row indexes and the array indexes.
Row 0 Row 1 Row 0
0 1 2 3 4 5 0 1 2
[ [a b c] [d e f] ] [ [g h i] ]
Further, we must consider lists: a column may consist of a list of
arrays. Or, a column may consist of an array of maps, one of which is
a list of arrays. So, the above reasoning must apply recursively down
the value tree.
As it turns out, there is a simple recursive algorithm, which is a simple extension of the reasoning for the toplevel scalar case, that can handle arrays:
Consider the writers. Each writer corresponds to a single vector. Writers are grouped into logical tree nodes. Those in the root node write to (single, scalar) columns that are either toplevel columns, or nested some level down in singlevalue (not array) tuples. Another tree level occurs in an array: the elements of the array use a different (fasterchanging) index than the top (rowlevel) writers. Different arrays have different indexes: a row may have, say, four elements in array A, but 20 elements in array B.
Further, arrays can be singular (a repeated int, say) or for an entire tuple (a repeated map.) And, since Drill supports the full JSON model, in the most general case, there is a tree of array indexes that can be nested to an arbitrary level. (A row can have an array of maps which contains a column that is, itself, a list of repeated maps, a field of which is an array of ints.)
Writers handle this index tree via a tree of ColumnWriterIndex
objects, often specialized for various tasks.
Now we can get to the key concept in this section: how we update those indexes after an overflow. The toplevel index reverts to zero. (We start writing the 0th row in the new lookahead batch.) But, nested indexes (those for arrays) will start at some other position depending on the number elements already written in an overflow row. The number of such elements is determined by a topdown traversal of the tree (to determine the start offset of each array for the row.) Resetting the writer indexes is a bottomup process: based on the number of elements in that array, the writer index is reset to match.
This flow is the opposite of the "normal" case in which a new batch is started topdown, with each index being reset to zero.
VectorContainer
),ListVector
),vector
, andRepeatedMapVector
.TupleMode
abstraction. In particular, we use the
single tuple model which works with a single batch. This model provides a
simple, uniform interface to work with columns and tuples (rows, maps),
and a simple way to work with arrays. This interface reduces the above
array algorithm to a simple set of recursive method calls.Copyright © 1970 The Apache Software Foundation. All rights reserved.