Package org.apache.drill.exec.physical.impl.scan.v3.lifecycle
package org.apache.drill.exec.physical.impl.scan.v3.lifecycle
Implements the details of the scan lifecycle for a set of readers,
primarily the process of resolving the scan output schema from a variety
of input schemas, then running each reader, each of which will produce
some number of batches. Handles missing and implicit columns.
Defines the projection, vector continuity and other operations for a set of one or more readers. Separates the core reader protocol from the logic of working with batches.
Schema Evolution
Drill discovers schema on the fly. The scan operator hosts multiple readers. In general, each reader may have a distinct schema, though the user typically arranges data in a way that scanned files have a common schema (else SQL is the wrong tool for analysis.) Still, subtle changes can occur: file A is an old version without a new column c, while file B includes the column. And so on.
The scan operator resolves the scan schema, striving to send a single, uniform
schema downstream. That schema should represent the data from all readers
in this scan and in other fragments of the same logical scan. The difficulty
arises when the information available underdetermines the output schema:
the mechanism here attempts to fill in gaps, and flags conflicts. Only a
provided or defined schema (see ScanSchemaTracker
resolves all
ambiguities.)
-
ClassDescriptionBuilds the handler which provides values for columns in an explicit project list but for which the reader provides no values.Builds an output batch based on an output schema and one or more input schemas.Describes an input batch with a schema and a vector container.Source map as a map schema and map vector.Manages the schema and batch construction for a managed reader.Binds the scan lifeycle to the scan operator./** Basic scan framework for a set of "managed" readers and which uses the scan schema tracker to evolve the scan output schema.Implementation of the schema negotiation between scan operator and batch reader.Base class for columns that take values based on the reader, not individual rows.