Class ResultVectorCacheImpl

java.lang.Object
org.apache.drill.exec.physical.resultSet.impl.ResultVectorCacheImpl
All Implemented Interfaces:
ResultVectorCache

public class ResultVectorCacheImpl extends Object implements ResultVectorCache
Manages an inventory of value vectors used across row batch readers. Drill semantics for batches is complex. Each operator logically returns a batch of records on each call of the Drill Volcano iterator protocol next() operation. However, the batches "returned" are not separate objects. Instead, Drill enforces the following semantics:
  • If a next() call returns OK then the set of vectors in the "returned" batch must be identical to those in the prior batch. Not just the same type; they must be the same ValueVector objects. (The buffers within the vectors will be different.)
  • If the set of vectors changes in any way (add a vector, remove a vector, change the type of a vector), then the next() call must return OK_NEW_SCHEMA.
These rules create interesting constraints for the scan operator. Conceptually, each batch is distinct. But, it must share vectors. The ResultSetLoader class handles this by managing the set of vectors used by a single reader.

Readers are independent: each may read a distinct schema (as in JSON.) Yet, the Drill protocol requires minimizing spurious OK_NEW_SCHEMA events. As a result, two readers run by the same scan operator must share the same set of vectors, despite the fact that they may have different schemas and thus different ResultSetLoaders.

The purpose of this inventory is to persist vectors across readers, even when, say, reader B does not use a vector that reader A created.

The semantics supported by this class include:

  • Ability to "pre-declare" columns based on columns that appear in an explicit select list. This ensures that the columns are known (but not their types).
  • Ability to reuse a vector across readers if the column retains the same name and type (minor type and mode.)
  • Ability to flush unused vectors for readers with changing schemas if a schema change occurs.
  • Support schema "hysteresis"; that is, the a "sticky" schema that minimizes spurious changes. Once a vector is declared, it can be included in all subsequent batches (provided the column is nullable or an array.)