Interface RowSet

All Known Subinterfaces:
RowSet.ExtendableRowSet, RowSet.HyperRowSet, RowSet.SingleRowSet
All Known Implementing Classes:
AbstractRowSet, AbstractSingleRowSet, DirectRowSet, HyperRowSetImpl, IndirectRowSet

public interface RowSet
A row set is a collection of rows stored as value vectors. Elsewhere in Drill we call this a "record batch", but that term has been overloaded to mean the runtime implementation of an operator.

A row set encapsulates a set of vectors and provides access to Drill's various "views" of vectors: VectorContainer, VectorAccessible, etc. The row set wraps a {#link TupleModel} which holds the vectors and column metadata. This form is optimized for easy use in testing; use other implementations for production code.

A row set is defined by a TupleMetadata. For testing purposes, a row set has a fixed schema; we don't allow changing the set of vectors dynamically.

The row set also provides a simple way to write and read records using the RowSetWriter and RowSetReader interfaces. As per Drill conventions, a row set can be written (once), read many times, and finally cleared.

Drill provides a large number of vector (data) types. Each requires a type-specific way to set data. The row set writer uses a ColumnWriter to set each value in a way unique to the specific data type. Similarly, the row set reader provides a ScalarReader interface. In both cases, columns can be accessed by index number (as defined in the schema) or by name.

A row set follows a schema. The schema starts as a BatchSchema, but is parsed and restructured into a variety of forms. In the original form, maps contain their value vectors. In the flattened form, all vectors for all maps (and the top-level tuple) are collected into a single structure. Since this structure is for testing, this somewhat-static structure works just file; we don't need the added complexity that comes from building the schema and data dynamically.

Putting this all together, the typical life-cycle flow is:

  • Method Details

    • isExtendable

      boolean isExtendable()
    • isWritable

      boolean isWritable()
    • vectorAccessible

      VectorAccessible vectorAccessible()
    • container

      VectorContainer container()
    • rowCount

      int rowCount()
    • reader

      RowSetReader reader()
    • clear

      void clear()
    • schema

      TupleMetadata schema()
    • allocator

      BufferAllocator allocator()
    • indirectionType

    • print

      void print()
      Debug-only tool to visualize a row set for inspection. Do not use this in production code.
    • size

      long size()
      Return the size in memory of this record set, including indirection vectors, null vectors, offset vectors and the entire (used and unused) data vectors.
      memory size in bytes
    • batchSchema

      BatchSchema batchSchema()