Package org.apache.drill.exec.physical.impl.scan.v3.lifecycle


package org.apache.drill.exec.physical.impl.scan.v3.lifecycle
Implements the details of the scan lifecycle for a set of readers, primarily the process of resolving the scan output schema from a variety of input schemas, then running each reader, each of which will produce some number of batches. Handles missing and implicit columns.

Defines the projection, vector continuity and other operations for a set of one or more readers. Separates the core reader protocol from the logic of working with batches.

Schema Evolution

Drill discovers schema on the fly. The scan operator hosts multiple readers. In general, each reader may have a distinct schema, though the user typically arranges data in a way that scanned files have a common schema (else SQL is the wrong tool for analysis.) Still, subtle changes can occur: file A is an old version without a new column c, while file B includes the column. And so on.

The scan operator resolves the scan schema, striving to send a single, uniform schema downstream. That schema should represent the data from all readers in this scan and in other fragments of the same logical scan. The difficulty arises when the information available underdetermines the output schema: the mechanism here attempts to fill in gaps, and flags conflicts. Only a provided or defined schema (see ScanSchemaTracker resolves all ambiguities.)