Class ScanLifecycleBuilder

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.ScanLifecycleBuilder
Direct Known Subclasses:
FileScanLifecycleBuilder

public class ScanLifecycleBuilder extends Object
Gathers options for the ScanLifecycle then builds a scan lifecycle instance.

This framework is a bridge between operator logic and the scan internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan lifecycle at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.

Inputs

At this basic level, a scan framework requires just a few simple inputs:
  • The options defined by the scan projection framework such as the projection list.
  • A reader factory to create a reader for each of the files or blocks to be scanned. (Readers are expected to be created one-by-one as files are read.)
  • The operator context which provides access to a memory allocator and other plumbing items.

In practice, there are other options to fine tune behavior (provided schema, custom error context, various limits, etc.)

  • Field Details

    • MIN_BATCH_BYTE_SIZE

      public static final int MIN_BATCH_BYTE_SIZE
      See Also:
    • MAX_BATCH_BYTE_SIZE

      public static final int MAX_BATCH_BYTE_SIZE
      See Also:
    • DEFAULT_BATCH_ROW_COUNT

      public static final int DEFAULT_BATCH_ROW_COUNT
      See Also:
    • DEFAULT_BATCH_BYTE_COUNT

      public static final int DEFAULT_BATCH_BYTE_COUNT
    • MAX_BATCH_ROW_COUNT

      public static final int MAX_BATCH_ROW_COUNT
      See Also:
    • userName

      protected String userName
    • nullType

      protected TypeProtos.MajorType nullType
    • allowRequiredNullColumns

      protected boolean allowRequiredNullColumns
    • definedSchema

      protected TupleMetadata definedSchema
    • providedSchema

      protected TupleMetadata providedSchema
    • enableSchemaBatch

      protected boolean enableSchemaBatch
      Option that enables whether the scan operator starts with an empty schema-only batch (the so-called "fast schema" that Drill once tried to provide) or starts with a non-empty data batch (which appears to be the standard since the "Empty Batches" project some time back.) See more details in OperatorDriver Javadoc.

      Defaults to false, meaning to not provide the empty schema batch. DRILL-7305 explains that many operators fail when presented with an empty batch, so do not enable this feature until those issues are fixed. Of course, do enable the feature if you want to track down the DRILL-7305 bugs.

    • disableEmptyResults

      protected boolean disableEmptyResults
      Option to disable empty results. An empty result occurs if no reader has any data, but at least one reader can provide a schema. In this case, the scan can return a single, empty batch, with an associated schema. This is the correct SQL result for an empty query. However, if this result triggers empty-batch bugs in other operators, we can, instead, disable this feature and return a null result set: no schema, no batch, just a "fast NONE", an immediate return of NONE from the Volcano iterator.

      Disabling this option is not desirable: it means that the user gets no schema for queries that should be able to return one. So, disable this option only if we cannot find or fix empty-batch bugs.

    • allowSchemaChange

      protected boolean allowSchemaChange
      Option to disable schema changes. If false, then the first batch commits the scan to a single, unchanged schema. If true (the legacy default), then each batch or reader can change the schema, even though downstream operators generally cannot handle a schema change. The goal is to evolve all readers so that they do not generate schema changes.
    • schemaValidator

      protected ScanLifecycleBuilder.SchemaValidator schemaValidator
      Optional schema validator to perform per-scan checks of the projection or resolved schema.
    • errorContext

      protected CustomErrorContext errorContext
      Context for error messages.
  • Constructor Details

    • ScanLifecycleBuilder

      public ScanLifecycleBuilder()
  • Method Details

    • options

      public void options(OptionSet options)
    • options

      public OptionSet options()
    • readerFactory

      public void readerFactory(ReaderFactory<?> readerFactory)
    • userName

      public void userName(String userName)
    • userName

      public String userName()
    • batchRecordLimit

      public void batchRecordLimit(int batchRecordLimit)
      Specify a custom batch record count. This is the maximum number of records per batch for this scan. Readers can adjust this, but the adjustment is capped at the value specified here
      Parameters:
      batchRecordLimit - maximum records per batch
    • batchByteLimit

      public void batchByteLimit(int byteLimit)
    • nullType

      public void nullType(TypeProtos.MajorType nullType)
      Specify the type to use for null columns in place of the standard nullable int. This type is used for all missing columns. (Readers that need per-column control need a different mechanism.)
      Parameters:
      nullType - the type to use for null columns
    • allowRequiredNullColumns

      public void allowRequiredNullColumns(boolean flag)
    • allowRequiredNullColumns

      public boolean allowRequiredNullColumns()
    • allowSchemaChange

      public void allowSchemaChange(boolean flag)
    • allowSchemaChange

      public boolean allowSchemaChange()
    • projection

      public void projection(List<SchemaPath> projection)
    • enableSchemaBatch

      public void enableSchemaBatch(boolean option)
    • disableEmptyResults

      public void disableEmptyResults(boolean option)
    • definedSchema

      public void definedSchema(TupleMetadata definedSchema)
    • definedSchema

      public TupleMetadata definedSchema()
    • providedSchema

      public void providedSchema(TupleMetadata providedSchema)
    • providedSchema

      public TupleMetadata providedSchema()
    • errorContext

      public void errorContext(CustomErrorContext context)
    • errorContext

      public CustomErrorContext errorContext()
    • projection

      public List<SchemaPath> projection()
    • scanBatchRecordLimit

      public int scanBatchRecordLimit()
    • scanBatchByteLimit

      public int scanBatchByteLimit()
    • nullType

      public TypeProtos.MajorType nullType()
    • readerFactory

      public ReaderFactory<?> readerFactory()
    • schemaValidator

      public void schemaValidator(ScanLifecycleBuilder.SchemaValidator schemaValidator)
    • schemaValidator

      public ScanLifecycleBuilder.SchemaValidator schemaValidator()
    • limit

      public void limit(long limit)
    • limit

      public long limit()
    • build

      public ScanLifecycle build(OperatorContext context)
    • buildScan

      public ScanOperatorExec buildScan()
    • buildScanOperator

      public OperatorRecordBatch buildScanOperator(FragmentContext fragContext, PhysicalOperator pop)