java.lang.Object
org.apache.drill.exec.physical.impl.scan.framework.ShimBatchReader
All Implemented Interfaces:
SchemaNegotiatorImpl.NegotiatorListener, RowBatchReader

public class ShimBatchReader extends Object implements RowBatchReader, SchemaNegotiatorImpl.NegotiatorListener
Represents a layer of row batch reader that works with a result set loader and schema manager to structure the data read by the actual row batch reader.

Provides the row set loader used to construct record batches.

The idea of this class is that schema construction is complex, and varies depending on the kind of reader. Rather than pack that logic into the scan operator and scan-level reader state, this class abstracts out the schema logic. This allows a variety of solutions as needed for different readers.

  • Field Details

  • Constructor Details

  • Method Details

    • name

      public String name()
      Description copied from interface: RowBatchReader
      Name used when reporting errors. Can simply be the class name.
      Specified by:
      name in interface RowBatchReader
      Returns:
      display name for errors
    • reader

      public ManagedReader<? extends SchemaNegotiator> reader()
    • open

      public boolean open()
      Description copied from interface: RowBatchReader
      Setup the record reader. Called just before the first call to next(). Allocate resources here, not in the constructor. Example: open files, allocate buffers, etc.
      Specified by:
      open in interface RowBatchReader
      Returns:
      true if the reader is open and ready to read (possibly no) rows. false for a "soft" failure in which no schema or data is available, but the scanner should not fail, it should move onto another reader
    • defineSchema

      public boolean defineSchema()
      Description copied from interface: RowBatchReader
      Called for the first reader within a scan. Allows the reader to provide an empty batch with only the schema filled in. Readers that are "early schema" (know the schema up front) should return true and create an empty batch. Readers that are "late schema" should return false. In that case, the scan operator will ask the reader to load an actual data batch, and infer the schema from that batch.

      This step is optional and is purely for performance.

      Specified by:
      defineSchema in interface RowBatchReader
      Returns:
      true if this reader can (and has) defined an empty batch to describe the schema, false otherwise
    • next

      public boolean next()
      Description copied from interface: RowBatchReader
      Read the next batch. Reading continues until either EOF, or until the mutator indicates that the batch is full. The batch is considered valid if it is non-empty. Returning true with an empty batch is valid, and is helpful on the very first batch (returning schema only.) An empty batch with a false return code indicates EOF and the batch will be discarded. A non-empty batch along with a false return result indicates a final, valid batch, but that EOF was reached and no more data is available.

      This somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.

      Specified by:
      next in interface RowBatchReader
      Returns:
      true if more data may be available (and so next() should be called again, false to indicate that EOF was reached
    • output

      public VectorContainer output()
      Description copied from interface: RowBatchReader
      Return the container with the reader's output. This method is called at two times:
      • Directly after the call to RowBatchReader.open(). If the data source can provide a schema at open time, then the reader should provide an empty batch with the schema set. The scanner will return this schema downstream to inform other operators of the schema.
      • Directly after a successful call to RowBatchReader.next() to retrieve the batch produced by that call. (No call is made if next() returns false.
      Note that most operators require the same vectors be present in each container. So, in practice, a reader must return the same container, and same set of vectors, on each call.
      Specified by:
      output in interface RowBatchReader
      Returns:
      a vector container, with the record count and schema set, that announces the schema after open() (optional) or returns rows read after next() (required)
    • close

      public void close()
      Description copied from interface: RowBatchReader
      Release resources. Called just after a failure, when the scanner is cancelled, or after next() returns EOF. Release all resources and close files. Guaranteed to be called if open() returns normally; will not be called if open() throws an exception.
      Specified by:
      close in interface RowBatchReader
    • schemaVersion

      public int schemaVersion()
      Description copied from interface: RowBatchReader
      Return the version of the schema returned by RowBatchReader.output(). The schema is assumed to start at -1 (no schema). The reader is free to use any numbering system it likes as long as:
      • The numbers are non-negative, and increase (by any increment),
      • Numbers between successive calls are idential if the batch schemas are identical,
      • The number increases if a batch has a different schema than the previous batch.
      Numbers increment (or not) on calls to next(). Thus Two successive calls to this method should return the same number if no next() call lies between.

      If the reader can return a schema on open (so-called "early-schema), then this method must return a non-negative version number, even if the schema happens to be empty (such as reading an empty file.)

      However, if the reader cannot return a schema on open (so-called "late schema"), then this method must return -1 (and output() must return null) to indicate now schema is available when called before the first call to next().

      No calls will be made to this method before open() after {@code close(){@code or after {@code next()} returns false. The implementation is thus not required to handle these cases. @return the schema version, or -1 if no schema version is yet available

      Specified by:
      schemaVersion in interface RowBatchReader
    • build

      public ResultSetLoader build(SchemaNegotiatorImpl schemaNegotiator)
      Specified by:
      build in interface SchemaNegotiatorImpl.NegotiatorListener