org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ScanLifecycle

Direct Known Subclasses:: FileScanLifecycle

public class ScanLifecycle extends Object

/** Basic scan framework for a set of "managed" readers and which uses the scan schema tracker to evolve the scan output schema. Readers are created and managed via a reader factory class unique to each type of scan. The reader factory also provides the scan-specific schema negotiator to be passed to the reader.

Lifecycle

The options provided in the ScanLifecycleBuilder are sufficient to drive the entire scan operator functionality. Schema resolution and projection is done generically and is the same for all data sources. Only the reader (created via the factory class) differs from one type of file to another.

The framework achieves the work described below by composing a set of detailed classes, each of which performs some specific task. This structure leaves the reader to simply infer schema and read data.

Reader Integration

The details of how a file is structured, how a schema is inferred, how data is decoded: all that is encapsulated in the reader. The only real Interaction between the reader and the framework is:

The reader factory creates a reader and the corresponding schema negotiator.
The reader "negotiates" a schema with the framework. The framework knows the projection list from the query plan, knows something about data types (whether a column should be scalar, a map or an array), and knows about the schema already defined by prior readers. The reader knows what schema it can produce (if "early schema.") The schema negotiator class handles this task.
The reader reads data from the file and populates value vectors a batch at a time. The framework creates the result set loader to use for this work. The schema negotiator returns that loader to the reader, which uses it during read.
A reader may be "late schema", true "schema on read." In this case, the reader simply tells the result set loader to create a new column reader on the fly. The framework will work out if that new column is to be projected and will return either a real column writer (projected column) or a dummy column writer (unprojected column.)
The reader then reads batches of data until all data is read. The result set loader signals when a batch is full; the reader should not worry about this detail itself.
The reader then releases its resources.

See ScanSchemaTracker for details about how the scan schema evolves over the scan lifecycle.

Lifecycle

Coordinates the components that make up a scan implementation:

ScanSchemaTracker which resolves the scan schema over the lifetime of the scan.
Implicit columns manager which identifies and populates implicit file columns, partition columns, and Drill's internal metadata columns.
The actual readers which load (possibly a subset of) the columns requested from the input source.

Implicit columns are unique to each storage plugin. At present, they are defined only for the file system plugin. To handle such variation, each extension defines a subclass of the ScanLifecycleBuilder class to create the implicit columns manager (and schema negotiator) unique to a certain kind of scan.

Each reader is tracked by a ReaderLifecycle which handles:

Setting up the ResultSetLoader for the reader.
The concrete values for implicit columns for that reader (and its file, if file-based.)
The missing columns handler which "makes up" values for projected columns not read by the reader.
Batch asssembler, which combines the three sources of vectors to create the output batch with the schema specified by the schema tracker.

Publishing the Final Result Set

This class "publishes" a vector container that has the final, projected form of a scan. The projected schema include:

Columns from the reader.

Static columns, such as implicit or partition columns.

Null columns for items in the select list, but not found in either of the above two categories.

The order of columns is that set by the select list (or, by the reader for a `SELECT *` query.

See Also:

for a description of the schema lifecycle which drives a scan

Constructor Summary

Constructors

Constructor

Description

ScanLifecycle(OperatorContext context, ScanLifecycleBuilder builder)
Method Summary

Modifier and Type

Method

Description

BufferAllocator

allocator()

int

batchCount()

void

close()

OperatorContext

context()

CustomErrorContext

errorContext()

boolean

hasOutputSchema()

protected SchemaNegotiatorImpl

newNegotiator(ReaderLifecycle readerLifecycle)

RowBatchReader

nextReader()

ScanLifecycleBuilder

options()

TupleMetadata

outputSchema()

ReaderFactory<?>

readerFactory()

long

rowCount()

ScanSchemaTracker

schemaTracker()

void

tallyBatch(int rowCount)

ResultVectorCacheImpl

vectorCache()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- ScanLifecycle
  
  public ScanLifecycle(OperatorContext context, ScanLifecycleBuilder builder)
Method Details
- context
  
  public OperatorContext context()
- options
  
  public ScanLifecycleBuilder options()
- schemaTracker
  
  public ScanSchemaTracker schemaTracker()
- vectorCache
  
  public ResultVectorCacheImpl vectorCache()
- readerFactory
  
  public ReaderFactory<?> readerFactory()
- hasOutputSchema
  
  public boolean hasOutputSchema()
- errorContext
  
  public CustomErrorContext errorContext()
- allocator
  
  public BufferAllocator allocator()
- batchCount
  
  public int batchCount()
- rowCount
  
  public long rowCount()
- tallyBatch
  
  public void tallyBatch(int rowCount)
- nextReader
  
  public RowBatchReader nextReader()
- newNegotiator
  
  protected SchemaNegotiatorImpl newNegotiator(ReaderLifecycle readerLifecycle)
- outputSchema
  
  public TupleMetadata outputSchema()
- close
  
  public void close()

Class ScanLifecycle

Lifecycle

Reader Integration

Lifecycle

Publishing the Final Result Set

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

ScanLifecycle

Method Details

context

options

schemaTracker

vectorCache

readerFactory

hasOutputSchema

errorContext

allocator

batchCount

rowCount

tallyBatch

nextReader

newNegotiator

outputSchema

close