Package org.apache.drill.exec.vector.accessor.writer
Understanding the Vector Model
Vectors can be categorized along multiple dimensions:- By data (minor) type
- By cardinality (mode)
- By fixed or variable width
- By repeat levels
A repeated map, a list, a repeated list and any array (repeated) scalar all are array-like. Nullable and required modes are identical (single values), but a nullable has an additional is-set ("bit") vector.
The writers (and readers) borrow concepts from JSON and relational theory to simplify the problem:
- Both the top-level row, and a Drill map are "tuples" and are treated similarly in the model.
- All non-map, non-list (that is, scalar) data types are treated uniformly.
- All arrays (whether a list, a repeated list, a repeated map, or a repeated scalar) are treated uniformly.
Repeat Levels
JSON and Parquet can be understood as a series of one or more "repeat levels." First, let's identify the repeat levels above the batch level:- The top-most level is the "result set": the entire collection of rows that come from a file (or other data source.)
- Result sets are divided into batches: collections of up to 64K rows.
- Each batch is a collection or rows. A batch-level index points to the current row.
- Top down: the row position points into an offset vector. The offset vector value points to either the data value, or into another offset vector.
- Bottom-up: values are appended to the end of the vector. Values are "pinched off" to form an array (for repeated maps) or for a row. In this view, indexes bubble upward. The inner-most last write position is written as the array end position in the enclosing offset vector. This may occur up several levels.
Writer Data Model
The above leads to a very simple, JSON-like data model:- A tuple reader or writer models a row. (Usually via a subclass.) Column are accessible by name or position.
- Every column is modeled as an object.
- The object can have an object type: scalar, tuple or array.
- An array has a single element type (but many run-time elements)
- A scalar can be nullable or not, and provides a uniform get/set interface.
This data model is similar to; but has important differences from, the prior, generated, readers and writers. This version is based on the concept of minimizing the number of writer classes, and leveraging Java primitives to keep the number of get/set methods to a reasonable size. This version also automates vector allocation, vector overflow and so on.
The object layer is new: it is the simplest way to model the three "object types." An app using this code would use just the leaf scalar readers and writers.
Similarly, the ColumnWriter
interface provides a uniform way to
access services common to all writer types, while allowing each JSON-like
writer to provide type-specific ways to access data.
Writer Performance
To maximize performance, have a single version for all "data modes": (nullable, required, repeated). Some items of note:- The writers bypass DrillBuf and the UDLE to needed writes to direct memory.
- The writers buffer the buffer address and implement a number of methods to synchronize that address when the buffer changes (on a new batch or during vector resize).
- Writing require a single bounds check. In most cases, the write is within bounds so the single check is all that is needed.
- If the write is out of bounds, then the writer determines the new vector size and performs the needed reallocation. To avoid multiple doublings, the writer computes the needed new size and allocates that size directly.
- Vector reallocation is improved to eliminate zeroing the new half of the buffer, data is left "garbage-filled."
- If the vector would grow beyond 16 MB, then overflow is triggered, via a listener, which causes the buffer to be replaced. The write then continues.
- Offset vector updates are integrated into the writers using an `OffsetVectorWriter`. This writer caches the last write position so that each array write needs a single offset update, rather than the read and write as in previous code.
- The writers keep track of the "last write position" and perform "fill-empties" work if the new write position is more than one position behind the last write. All types now correctly support "fill-empties" (before, only nullable types did so reliably.)
- Null handling is done by an additional writer layer that wraps the underlying data writer. This avoids the need for a special nullable writer: the same nullable layer works for all data types.
- Array handling is done similarly: an array writer manages the offset vector and works the same for repeated scalars, repeated maps and (eventually) lists and repeated lists.
Lists
As described in the API package, Lists and Unions in Drill are highly complex, and not well supported. This creates huge problems in the writer layer because we must support something which is broken and under-used, but which most people assume works (it is part of Drill's JSON-like, schema-free model.) Our goal here is to support Union and List well enough that nothing new is broken; though this layer cannot fix the issues elsewhere in Drill.The most complex part is List support for the transition from a single type to a union of types. The API should be simple: the client should not have to be aware of the transition.
To make this work, the writers provide two options:
- Use metadata to state that a List will have exactly one type and to specify that type. The List will present as an array of that type in which each array can be null.
- Otherwise, the list is repeated union (a array of variants), even if the list happens to have 0 or 1 types. In this case, the list presents as an array of variants.
The PromotableListWriter
handles the complex details of providing
the above simple API in the array-of-variant case.
Possible Improvements
The code here works and has extensive unit tests. But, many improvements are possible:- Drill has four "container" vector types (Map, Union, List, Repeated List), each with its own quirky semantics. The code could be far simpler if the container semantics were unified.
- Similarly, the corresponding writers implement some very awkward logic to handle union and list containers. But, these vectors are not fully supported in Drill. This means that the code implements (and tests) many odd cases which no one may ever use. Better to limit the data types that Drill supports, implement those well, and deprecate the obscure cases.
- The same schema-parsing logic appears over and over in different guises. Some parse vectors, some parse batch schemas, others parse the column metadata (which provides information not available in the materialized field) and so on. Would be great to find some common way to do this work, perhaps in the form of a visitor. An earlier version of this code tried using visitors. But, since each representation has its own quirks, that approach didn't work out. A first step would be to come up with a standard schema description which can be used in all cases, then build a visitor on that.
- Many tests exist. But, the careful reader will note that Drill's vectors define a potentially very complex recursive structure (most variations of which are never used.) Additional testing should cover all cases, such as repeated lists that contain unions, or unions that contain repeated lists of tuples.
- Experience with the code may show additional redundancies that can be removed. Each fresh set of eyes may see things that prior folks missed.
Caveats
The column accessors are divided into two packages: vector and java-exec. It is easy to add functionality in the wrong place, breaking abstraction and encapsulation. Here are some general guidelines:- The core reader and writer logic is implemented in this vector package. This package provides low-level tools to build accessors, but not the construction logic itself. (That is found in the java-exec layer.)
- The vector layer is designed to support both the simple "row set" and the more complex "result set loader" implementations.
- The "row set" layer wraps the accessors in tools that work on one batch (row set) at a time, without regard for schema versions, schema changes and the like. The row set layer is primarily for testing: building an input batch for some operator, and verifying an output batch. It also serves as a simple test framework for the accessors, without the complexity of other layers.
- The "result set loader" layer handles the complexity of dynamic schema evolution, schema versioning, vector overflow and projection. It provides an "industrial strength" version of the accessor mechanism intended for use in the scan operator (but which can be generalized for other operators.)
- The "listener" pattern is used to allow higher levels to "plug in" functionality to the accessor layer. This is especially true for schema evolution: listeners notify the higher layer to add a column, delegating the actual work to that higher layer.
+------------+
+-------------------------- | Result Set |
v | Loader |
+----------------+ +---------+ +------------+
| Metadata | <-- | Row Set | |
| Implementation | | Tools | |
+----------------+ +---------+ |
java-exec | | |
------------------- | ------------------- | ------ | ------------
vector v v v
+------------+ +-----------+
| Metadata | <--------- | Column |
| Interfaces | | Accessors |
+------------+ +-----------+
|
v
+---------+
| Value |
| Vectors |
+---------+
-
ClassDescriptionWriter for an array-valued column.Object representation of an array writer.Base class for writers for fixed-width vectors.Base class for writers that use the Java int type as their native type.Abstract base class for the object layer in writers.Base class for concrete scalar column writers including actual vector writers, and wrappers for nullable types.Column writer implementation that acts as the basis for the generated, vector-specific implementations.Wraps a scalar writer and its event handler to provide a uniform JSON-like interface for all writer types.Implementation for a writer for a tuple (a row or a map.) Provides access to each column using either a name or a numeric index.Generic object wrapper for the tuple writer.Listener (callback) to handle requests to add a new column to a tuple (row or map).Column writer implementation that acts as the basis for the generated, vector-specific implementations.Base class for variable-width (VarChar, VarBinary, etc.) writers.Specialized writer for bit columns.Gather generated writer classes into a set of class tables to allow rapid run-time creation of writers.Writer for a Dict entry.Internal implementation for a list of (possible) variants when the list has no type associated with it at all.List writer, which is basically an array writer, with the addition that each list element can be null.Writer for a Drill Map type.Writer for a an array of maps.Writer for a single (non-array) map.Writer for an array of either a map or another array.The implementation represents the writer as an array writer with special dict entry writer as its element writer.Interface for specialized operations on an offset vector.Specialized column writer for the (hidden) offset vector used with variable-length or repeated vectors.Implements a writer for a repeated list.Writer for a column that holds an array of scalars.Shim for a list that holds a single type, but may eventually become a list of variants.Unions are overly complex.Lists can operate in three modes: no type, one type or many types (that is, a list of unions.) This shim implements the variant writer when the backing vector is a union or a list backed by a union.Writer to a union vector.Internal interface used to control the behavior of writers.Listener (callback) for vector overflow events.Tracks the write state of a tuple or variant to allow applying the correct operations to newly-added columns to synchronize them with the rest of the writers.