Package org.apache.drill.exec.physical.resultSet


package org.apache.drill.exec.physical.resultSet
Provides a second-generation row set (AKA "record batch") writer used by client code to
  • Define the schema of a result set.
  • Write data into the vectors backing a row set.

Terminology

The code here follows the "row/column" naming convention rather than the "record/field" convention.
Result set
A set of zero or more row sets that hold rows of data.
Row set
A collection of rows with a common schema. Also called a "row batch" or "record batch." (But, in Drill, the term "record batch" also usually means an operator on that set of records. Here, a row set is just the rows &nash; separate from operations on that data.
Row
A single row of data, in the usual database sense. Here, a row is a kind of tuple (see below) allowing both name and index access to columns.
Tuple
In relational theory, a row is a tuple: a collection of values defined by a schema. Tuple values are indexed by position or name.
Column
A single value within a row or row set. (Generally, the context makes clear if the term refers to single value or all values for a column for a row set. Columns are backed by value vectors.
Map
In Drill, a map is what other systems call a "structure". It is, in fact, a nested tuple. In a Java or Python map, each map instance has a distinct set of name/value pairs. But, in Drill, all map instances have the same schema; hence the so-called "map" is really a tuple. This implementation exploits that fact and treats the row, and nested maps, almost identically: both provide columns indexed by name or position.
Row Set Mutator
An awkward name, but retains the "mutator" name from the previous generation. The mechanism to build a result set as series of row sets.
Tuple Loader
Mechanism to build a single tuple (row or map) by providing name or index access to columns. A better name would b "tuple writer", but that name is already used elsewhere.
Column Loader
Mechanism to write values to a single column.

Building the Schema

The row set mutator works for two cases: a known schema or a discovered schema. A known schema occurs in the case, such as JDBC, where the underlying data source can describe the schema before reading any rows. In this case, client code can build the schema and pass that schema to the mutator directly. Alternatively, the client code can build the schema column-by-column before the first row is read.

Readers that discover schema can build the schema incrementally: add a column, load data for that column for one row, discover the next column, and so on. Almost any kind of column can be added at any time within the first batch:

  • Required columns are "back-filled" with zeros in the active batch, if that value makes sense for the column. (Date and Interval columns will throw an exception if added after the first row as there is no good "zero" value for that column. Varchar columns are back-filled with blanks.
  • Optional (nullable) columns can be added at any time; they are back-filled with nulls in the active batch. In general, if a column is added after the first row, it should be nullable, not required, unless the data source has a "missing = blank or zero" policy.
  • Repeated (array) columns can be added at any time; they are back-filled with empty entries in the first batch. Arrays can also be safely added at any time.
Client code must be aware of the semantics of adding columns at various times.
  • Columns added before or during the first row are the trivial case; this works for all data types and modes.
  • Required (non-nullable0 structured columns (Date, Period) cannot be added after the first row (as there is no good zero-fill value.)
  • Columns added within the first batch appear to the rest of Drill as if they were added before the first row: the downstream operators see the same schema from batch to batch.
  • Columns added after the first batch will trigger a schema-change event downstream.
  • The above is true during an "overflow row" (see below.) Once overflow occurs, columns added later in that overflow row will actually appear in the next batch, and will trigger a schema change when that batch is returned. That is, overflow "time shifts" a row addition from one batch to the next, and so it also time-shifts the column addition.
Use the TupleBuilder class to build the schema. The schema class is part of the RowSetLoader object available from the ResultSetLoader.writer() method.

Using the Schema

Presents columns using a physical schema. That is, map columns appear as columns that provide a nested map schema. Presumes that column access is primarily structural: first get a map, then process all columns for the map.

If the input is a flat structure, then the physical schema has a flattened schema as the degenerate case.

In both cases, access to columns is by index or by name. If new columns are added while loading, their index is always at the end of the existing columns.

Writing Data to the Batch

Each batch is delimited by a call to ResultSetLoader.startBatch() and a call to VectorState.harvestWithLookAhead() to obtain the completed batch. Note that readers do not call these methods; the scan operator does this work.

Each row is delimited by a call to startValue() and a call to saveRow(). startRow() performs initialization necessary for some vectors such as repeated vectors. saveRow() moves the row pointer ahead.

A reader can easily reject a row by calling startRow(), begin to load a row, but omitting the call to saveRow() In this case, the next call to startRow() repositions the row pointer to the same row, and new data will overwrite the previous data, effectively erasing the unwanted row. This also works for the last row; omitting the call to saveRow() causes the batch to hold only the rows actually saved.

Readers then write to each column. Columns are accessible via index (TupleWriter.column(int) or by name (TupleWriter.column(String). Indexed access is much faster. Column indexes are defined by the order that columns are added. The first column is column 0, the second is column 1 and so on.

Each call to the above methods returns the same column writer, allowing the reader to cache column writers for additional performance.

All column writers are of the same class; there is no need to cast to a type corresponding to the vector. Instead, they provide a variety of setType methods, where the type is one of various Java primitive or structured types. Most vectors provide just one method, but others (such as VarChar) provide two. The implementation will throw an exception if the vector does not support a particular type.

Note that this class uses the term "loader" for row and column writers since the term "writer" is already used by the legacy record set mutator and column writers.

Handling Batch Limits

The mutator enforces two sets of batch limits:
  1. The number of rows per batch. The limit defaults to 64K (the Drill maximum), but can be set lower by the client.
  2. The size of the largest vector, which is capped at 16 MB. (A future version may allow adjustable caps, or cap the memory of the entire batch.
Both limits are presented to the client via the RowSetLoader.isFull() method. After each call to saveRow(), the client should call isFull() to determine if the client can add another row. Note that failing to do this check will cause the next call to ResultSetLoader.startBatch() to throw an exception.

The limits have subtle differences, however. Row limits are simple: at the end of the last row, the mutator notices that no more rows are possible, and so does not allow starting a new row.

Vector overflow is more complex. A row may consist of columns (a, b, c). The client may write column a, but then column b might trigger a vector overflow. (For example, b is a Varchar, and the value for b is larger than the space left in the vector.) The client cannot stop and rewrite a. Instead, the client simply continues writing the row. The mutator, internally, moves this "overflow" row to a new batch. The overflow row becomes the first row of the next batch rather than the first row of the current batch.

For this reason, the client can treat the two overflow cases identically, as described above.

There are some subtle differences between the two cases that clients may occasionally may need to expect:

  • When a vector overflow occurs, the returned batch will have one fewer rows than the client might expect if it is simply counting the rows written.
  • A new column added to the batch after overflow occurs will appear in the next batch, triggering a schema change between the current and next batches.