org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ReaderLifecycle

All Implemented Interfaces:: RowBatchReader

public class ReaderLifecycle extends Object implements RowBatchReader

Manages the schema and batch construction for a managed reader. Allows the reader itself to be as simple as possible. This class implements the basic RowBatchReader protocol based on three methods, and converts it to the two-method protocol of the managed reader. The open() call of the RowBatchReader is combined with the constructor of the ManagedReader, enforcing the rule that the managed reader is created just-in-time when it is to be used, which avoids accidentally holding resources for the life of the scan. Also allows most of the reader's fields to be final.

Coordinates the components that wrap a reader to create the final output batch:

The actual reader which loads (possibly a subset of) the columns requested from the input source.
Implicit columns manager instance which populates implicit file columns, partition columns, and Drill's internal implicit columns.
The missing columns handler which "makes up" values for projected columns not read by the reader.
Batch assembler, which combines the three sources of vectors to create the output batch with the schema specified by the schema tracker.

This class coordinates the reader-visible aspects of the scan:

The SchemaNegotiator (or subclass) which provides schema-related input to the reader and which creates the reader's ResultSetLoader, among other tasks. The schema negotiator is specific to each kind of scan and is thus created via the ScanLifecycleBuilder.
The reader, which is designed to be as simple as possible, with all generic overhead tasks handled by this "shim" between the scan operator and the actual reader implementation.

The reader is schema-driven. See ScanSchemaTracker for an overview.

The reader is given a reader input schema, via the schema negotiator, which specifies the desired output schema. The schema can be fully dynamic (a wildcard), fully defined (a prior reader already chose column types), or a hybrid.
The reader can load a subset of columns. Those that are left out become "missing columns" to be filled in by this class.
The reader output schema along with implicit and missing columns, together define the scan's output schema.

The framework handles the projection task so the reader does not have to worry about it. Reading an unwanted column is low cost: the result set loader will have provided a "dummy" column writer that simply discards the value. This is just as fast as having the reader use if-statements or a table to determine which columns to save.

Field Summary

Fields

Modifier and Type

Field

Description

protected final TupleMetadata

readerInputSchema

protected ResultSetLoader

tableLoader
Constructor Summary

Constructors

Constructor

Description

ReaderLifecycle(ScanLifecycle scanLifecycle, long limit)
Method Summary

Modifier and Type

Method

Description

ResultSetLoader

buildLoader()

void

close()

Release resources.

boolean

defineSchema()

Called for the first reader within a scan.

CustomErrorContext

errorContext()

MissingColumnHandlerBuilder

missingColumnsBuilder(TupleMetadata readerSchema)

String

name()

Name used when reporting errors.

boolean

next()

Read the next batch.

boolean

open()

Setup the record reader.

VectorContainer

output()

Return the container with the reader's output.

TupleMetadata

readerInputSchema()

TupleMetadata

readerOutputSchema()

ScanLifecycle

scanLifecycle()

ScanLifecycleBuilder

scanOptions()

ScanSchemaTracker

schemaTracker()

int

schemaVersion()

Return the version of the schema returned by RowBatchReader.output().

ResultSetLoader

tableLoader()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- readerInputSchema
  
  protected final TupleMetadata readerInputSchema
- tableLoader
  
  protected ResultSetLoader tableLoader
Constructor Details
- ReaderLifecycle
  
  public ReaderLifecycle(ScanLifecycle scanLifecycle, long limit)
Method Details
- scanLifecycle
  
  public ScanLifecycle scanLifecycle()
- readerInputSchema
  
  public TupleMetadata readerInputSchema()
- errorContext
  
  public CustomErrorContext errorContext()
- schemaTracker
  
  public ScanSchemaTracker schemaTracker()
- scanOptions
  
  public ScanLifecycleBuilder scanOptions()
- name
  
  public String name()
  
  Description copied from interface: RowBatchReader
  
  Name used when reporting errors. Can simply be the class name.
  
  Specified by:
  
  name in interface RowBatchReader
  
  Returns:
  
  display name for errors
- tableLoader
  
  public ResultSetLoader tableLoader()
- open
  
  public boolean open()
  
  Description copied from interface: RowBatchReader
  
  Setup the record reader. Called just before the first call to next(). Allocate resources here, not in the constructor. Example: open files, allocate buffers, etc.
  
  Specified by:
  
  open in interface RowBatchReader
  
  Returns:
  
  true if the reader is open and ready to read (possibly no) rows. false for a "soft" failure in which no schema or data is available, but the scanner should not fail, it should move onto another reader
- buildLoader
  
  public ResultSetLoader buildLoader()
- defineSchema
  
  public boolean defineSchema()
  
  Description copied from interface: RowBatchReader
  
  Called for the first reader within a scan. Allows the reader to provide an empty batch with only the schema filled in. Readers that are "early schema" (know the schema up front) should return true and create an empty batch. Readers that are "late schema" should return false. In that case, the scan operator will ask the reader to load an actual data batch, and infer the schema from that batch.
  This step is optional and is purely for performance.
  
  Specified by:
  
  defineSchema in interface RowBatchReader
  
  Returns:
  
  true if this reader can (and has) defined an empty batch to describe the schema, false otherwise
- next
  
  public boolean next()
  
  Description copied from interface: RowBatchReader
  
  Read the next batch. Reading continues until either EOF, or until the mutator indicates that the batch is full. The batch is considered valid if it is non-empty. Returning true with an empty batch is valid, and is helpful on the very first batch (returning schema only.) An empty batch with a false return code indicates EOF and the batch will be discarded. A non-empty batch along with a false return result indicates a final, valid batch, but that EOF was reached and no more data is available.
  This somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.
  
  Specified by:
  
  next in interface RowBatchReader
  
  Returns:
  
  true if more data may be available (and so next() should be called again, false to indicate that EOF was reached
- missingColumnsBuilder
  
  public MissingColumnHandlerBuilder missingColumnsBuilder(TupleMetadata readerSchema)
- readerOutputSchema
  
  public TupleMetadata readerOutputSchema()
- output
  
  public VectorContainer output()
  
  Description copied from interface: RowBatchReader
  Return the container with the reader's output. This method is called at two times:
  
  Directly after the call to RowBatchReader.open(). If the data source can provide a schema at open time, then the reader should provide an empty batch with the schema set. The scanner will return this schema downstream to inform other operators of the schema.
  
  Directly after a successful call to RowBatchReader.next() to retrieve the batch produced by that call. (No call is made if next() returns false.
  
  Note that most operators require the same vectors be present in each container. So, in practice, a reader must return the same container, and same set of vectors, on each call.
  Specified by:
  
  output in interface RowBatchReader
  
  Returns:
  
  a vector container, with the record count and schema set, that announces the schema after open() (optional) or returns rows read after next() (required)
- schemaVersion
  
  public int schemaVersion()
  
  Description copied from interface: RowBatchReader
  Return the version of the schema returned by RowBatchReader.output(). The schema is assumed to start at -1 (no schema). The reader is free to use any numbering system it likes as long as:
  
  The numbers are non-negative, and increase (by any increment),
  
  Numbers between successive calls are idential if the batch schemas are identical,
  
  The number increases if a batch has a different schema than the previous batch.
  
  Numbers increment (or not) on calls to next(). Thus Two successive calls to this method should return the same number if no next() call lies between.
  If the reader can return a schema on open (so-called "early-schema), then this method must return a non-negative version number, even if the schema happens to be empty (such as reading an empty file.)
  However, if the reader cannot return a schema on open (so-called "late schema"), then this method must return -1 (and output() must return null) to indicate now schema is available when called before the first call to next().
  No calls will be made to this method before open() after {@code close(){@code or after {@code next()} returns false. The implementation is thus not required to handle these cases. @return the schema version, or -1 if no schema version is yet available
  Specified by:
  
  schemaVersion in interface RowBatchReader
- close
  
  public void close()
  
  Description copied from interface: RowBatchReader
  
  Release resources. Called just after a failure, when the scanner is cancelled, or after next() returns EOF. Release all resources and close files. Guaranteed to be called if open() returns normally; will not be called if open() throws an exception.
  
  Specified by:
  
  close in interface RowBatchReader

Class ReaderLifecycle

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

readerInputSchema

tableLoader

Constructor Details

ReaderLifecycle

Method Details

scanLifecycle

readerInputSchema

errorContext

schemaTracker

scanOptions

name

tableLoader

open

buildLoader

defineSchema

next

missingColumnsBuilder

readerOutputSchema

output

schemaVersion

close