Interface RowSet

All Known Subinterfaces:
RowSet.ExtendableRowSet, RowSet.HyperRowSet, RowSet.SingleRowSet
All Known Implementing Classes:
AbstractRowSet, AbstractSingleRowSet, DirectRowSet, HyperRowSetImpl, IndirectRowSet

public interface RowSet
A row set is a collection of rows stored as value vectors. Elsewhere in Drill we call this a "record batch", but that term has been overloaded to mean the runtime implementation of an operator.

A row set encapsulates a set of vectors and provides access to Drill's various "views" of vectors: VectorContainer, VectorAccessible, etc. The row set wraps a {#link TupleModel} which holds the vectors and column metadata. This form is optimized for easy use in testing; use other implementations for production code.

A row set is defined by a TupleMetadata. For testing purposes, a row set has a fixed schema; we don't allow changing the set of vectors dynamically.

The row set also provides a simple way to write and read records using the RowSetWriter and RowSetReader interfaces. As per Drill conventions, a row set can be written (once), read many times, and finally cleared.

Drill provides a large number of vector (data) types. Each requires a type-specific way to set data. The row set writer uses a ColumnWriter to set each value in a way unique to the specific data type. Similarly, the row set reader provides a ScalarReader interface. In both cases, columns can be accessed by index number (as defined in the schema) or by name.

A row set follows a schema. The schema starts as a BatchSchema, but is parsed and restructured into a variety of forms. In the original form, maps contain their value vectors. In the flattened form, all vectors for all maps (and the top-level tuple) are collected into a single structure. Since this structure is for testing, this somewhat-static structure works just file; we don't need the added complexity that comes from building the schema and data dynamically.

Putting this all together, the typical life-cycle flow is:

  • Method Details

    • isExtendable

      boolean isExtendable()
    • isWritable

      boolean isWritable()
    • vectorAccessible

      VectorAccessible vectorAccessible()
    • container

      VectorContainer container()
    • rowCount

      int rowCount()
    • reader

      RowSetReader reader()
    • clear

      void clear()
    • schema

      TupleMetadata schema()
    • allocator

      BufferAllocator allocator()
    • indirectionType

    • print

      void print()
      Debug-only tool to visualize a row set for inspection. Do not use this in production code.
    • size

      long size()
      Return the size in memory of this record set, including indirection vectors, null vectors, offset vectors and the entire (used and unused) data vectors.
      Returns:
      memory size in bytes
    • batchSchema

      BatchSchema batchSchema()