org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder

Direct Known Subclasses:: FileScanLifecycleBuilder

public class ScanLifecycleBuilder extends Object

Gathers options for the ScanLifecycle then builds a scan lifecycle instance.

This framework is a bridge between operator logic and the scan internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan lifecycle at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.

Inputs

At this basic level, a scan framework requires just a few simple inputs:

The options defined by the scan projection framework such as the projection list.
A reader factory to create a reader for each of the files or blocks to be scanned. (Readers are expected to be created one-by-one as files are read.)
The operator context which provides access to a memory allocator and other plumbing items.

In practice, there are other options to fine tune behavior (provided schema, custom error context, various limits, etc.)

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ScanLifecycleBuilder.DummyReaderFactory

static interface

ScanLifecycleBuilder.SchemaValidator
Field Summary

Fields

Modifier and Type

Field

Description

protected boolean

allowRequiredNullColumns

protected boolean

allowSchemaChange

Option to disable schema changes.

static final int

DEFAULT_BATCH_BYTE_COUNT

static final int

DEFAULT_BATCH_ROW_COUNT

protected TupleMetadata

definedSchema

protected boolean

disableEmptyResults

Option to disable empty results.

protected boolean

enableSchemaBatch

Option that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details in OperatorDriver Javadoc.

protected CustomErrorContext

errorContext

Context for error messages.

static final int

MAX_BATCH_BYTE_SIZE

static final int

MAX_BATCH_ROW_COUNT

static final int

MIN_BATCH_BYTE_SIZE

protected TypeProtos.MajorType

nullType

protected TupleMetadata

providedSchema

protected ScanLifecycleBuilder.SchemaValidator

schemaValidator

Optional schema validator to perform per-scan checks of the projection or resolved schema.

protected String

userName
Constructor Summary

Constructors

Constructor

Description

ScanLifecycleBuilder()
Method Summary

Modifier and Type

Method

Description

boolean

allowRequiredNullColumns()

void

allowRequiredNullColumns(boolean flag)

boolean

allowSchemaChange()

void

allowSchemaChange(boolean flag)

void

batchByteLimit(int byteLimit)

void

batchRecordLimit(int batchRecordLimit)

Specify a custom batch record count.

ScanLifecycle

build(OperatorContext context)

ScanOperatorExec

buildScan()

OperatorRecordBatch

buildScanOperator(FragmentContext fragContext, PhysicalOperator pop)

TupleMetadata

definedSchema()

void

definedSchema(TupleMetadata definedSchema)

void

disableEmptyResults(boolean option)

void

enableSchemaBatch(boolean option)

CustomErrorContext

errorContext()

void

errorContext(CustomErrorContext context)

long

limit()

void

limit(long limit)

TypeProtos.MajorType

nullType()

void

nullType(TypeProtos.MajorType nullType)

Specify the type to use for null columns in place of the standard nullable int.

OptionSet

options()

void

options(OptionSet options)

List<SchemaPath>

projection()

void

projection(List<SchemaPath> projection)

TupleMetadata

providedSchema()

void

providedSchema(TupleMetadata providedSchema)

ReaderFactory<?>

readerFactory()

void

readerFactory(ReaderFactory<?> readerFactory)

int

scanBatchByteLimit()

int

scanBatchRecordLimit()

ScanLifecycleBuilder.SchemaValidator

schemaValidator()

void

schemaValidator(ScanLifecycleBuilder.SchemaValidator schemaValidator)

String

userName()

void

userName(String userName)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MIN_BATCH_BYTE_SIZE
  
  public static final int MIN_BATCH_BYTE_SIZE
  See Also:
  
  Constant Field Values
- MAX_BATCH_BYTE_SIZE
  
  public static final int MAX_BATCH_BYTE_SIZE
  See Also:
  
  Constant Field Values
- DEFAULT_BATCH_ROW_COUNT
  
  public static final int DEFAULT_BATCH_ROW_COUNT
  See Also:
  
  Constant Field Values
- DEFAULT_BATCH_BYTE_COUNT
  
  public static final int DEFAULT_BATCH_BYTE_COUNT
- MAX_BATCH_ROW_COUNT
  
  public static final int MAX_BATCH_ROW_COUNT
  See Also:
  
  Constant Field Values
- userName
  
  protected String userName
- nullType
  
  protected TypeProtos.MajorType nullType
- allowRequiredNullColumns
  
  protected boolean allowRequiredNullColumns
- definedSchema
  
  protected TupleMetadata definedSchema
- providedSchema
  
  protected TupleMetadata providedSchema
- enableSchemaBatch
  
  protected boolean enableSchemaBatch
  
  Option that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details in OperatorDriver Javadoc.
  Defaults to false, meaning to not provide the empty schema batch. DRILL-7305 explains that many operators fail when presented with an empty batch, so do not enable this feature until those issues are fixed. Of course, do enable the feature if you want to track down the DRILL-7305 bugs.
- disableEmptyResults
  
  protected boolean disableEmptyResults
  
  Option to disable empty results. An empty result occurs if no reader has any data, but at least one reader can provide a schema. In this case, the scan can return a single, empty batch, with an associated schema. This is the correct SQL result for an empty query. However, if this result triggers empty-batch bugs in other operators, we can, instead, disable this feature and return a null result set: no schema, no batch, just a "fast NONE", an immediate return of NONE from the Volcano iterator.
  Disabling this option is not desirable: it means that the user gets no schema for queries that should be able to return one. So, disable this option only if we cannot find or fix empty-batch bugs.
- allowSchemaChange
  
  protected boolean allowSchemaChange
  
  Option to disable schema changes. If false, then the first batch commits the scan to a single, unchanged schema. If true (the legacy default), then each batch or reader can change the schema, even though downstream operators generally cannot handle a schema change. The goal is to evolve all readers so that they do not generate schema changes.
- schemaValidator
  
  protected ScanLifecycleBuilder.SchemaValidator schemaValidator
  
  Optional schema validator to perform per-scan checks of the projection or resolved schema.
- errorContext
  
  protected CustomErrorContext errorContext
  
  Context for error messages.
Constructor Details
- ScanLifecycleBuilder
  
  public ScanLifecycleBuilder()
Method Details
- options
  
  public void options(OptionSet options)
- options
  
  public OptionSet options()
- readerFactory
  
  public void readerFactory(ReaderFactory<?> readerFactory)
- userName
  
  public void userName(String userName)
- userName
  
  public String userName()
- batchRecordLimit
  
  public void batchRecordLimit(int batchRecordLimit)
  
  Specify a custom batch record count. This is the maximum number of records per batch for this scan. Readers can adjust this, but the adjustment is capped at the value specified here
  
  Parameters:
  
  batchRecordLimit - maximum records per batch
- batchByteLimit
  
  public void batchByteLimit(int byteLimit)
- nullType
  
  public void nullType(TypeProtos.MajorType nullType)
  
  Specify the type to use for null columns in place of the standard nullable int. This type is used for all missing columns. (Readers that need per-column control need a different mechanism.)
  
  Parameters:
  
  nullType - the type to use for null columns
- allowRequiredNullColumns
  
  public void allowRequiredNullColumns(boolean flag)
- allowRequiredNullColumns
  
  public boolean allowRequiredNullColumns()
- allowSchemaChange
  
  public void allowSchemaChange(boolean flag)
- allowSchemaChange
  
  public boolean allowSchemaChange()
- projection
  
  public void projection(List<SchemaPath> projection)
- enableSchemaBatch
  
  public void enableSchemaBatch(boolean option)
- disableEmptyResults
  
  public void disableEmptyResults(boolean option)
- definedSchema
  
  public void definedSchema(TupleMetadata definedSchema)
- definedSchema
  
  public TupleMetadata definedSchema()
- providedSchema
  
  public void providedSchema(TupleMetadata providedSchema)
- providedSchema
  
  public TupleMetadata providedSchema()
- errorContext
  
  public void errorContext(CustomErrorContext context)
- errorContext
  
  public CustomErrorContext errorContext()
- projection
  
  public List<SchemaPath> projection()
- scanBatchRecordLimit
  
  public int scanBatchRecordLimit()
- scanBatchByteLimit
  
  public int scanBatchByteLimit()
- nullType
  
  public TypeProtos.MajorType nullType()
- readerFactory
  
  public ReaderFactory<?> readerFactory()
- schemaValidator
  
  public void schemaValidator(ScanLifecycleBuilder.SchemaValidator schemaValidator)
- schemaValidator
  
  public ScanLifecycleBuilder.SchemaValidator schemaValidator()
- limit
  
  public void limit(long limit)
- limit
  
  public long limit()
- build
  
  public ScanLifecycle build(OperatorContext context)
- buildScan
  
  public ScanOperatorExec buildScan()
- buildScanOperator
  
  public OperatorRecordBatch buildScanOperator(FragmentContext fragContext, PhysicalOperator pop)

Class ScanLifecycleBuilder

Inputs

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

MIN_BATCH_BYTE_SIZE

MAX_BATCH_BYTE_SIZE

DEFAULT_BATCH_ROW_COUNT

DEFAULT_BATCH_BYTE_COUNT

MAX_BATCH_ROW_COUNT

userName

nullType

allowRequiredNullColumns

definedSchema

providedSchema

enableSchemaBatch

disableEmptyResults

allowSchemaChange

schemaValidator

errorContext

Constructor Details

ScanLifecycleBuilder

Method Details

options

options

readerFactory

userName

userName

batchRecordLimit

batchByteLimit

nullType

allowRequiredNullColumns

allowRequiredNullColumns

allowSchemaChange

allowSchemaChange

projection

enableSchemaBatch

disableEmptyResults

definedSchema

definedSchema

providedSchema

providedSchema

errorContext

errorContext

projection

scanBatchRecordLimit

scanBatchByteLimit

nullType

readerFactory

schemaValidator

schemaValidator

limit

limit

build

buildScan

buildScanOperator