Class ScanSchemaResolver

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.schema.ScanSchemaResolver

public class ScanSchemaResolver extends Object
Resolves a schema against the existing scan schema. Expands columns by comparing the existing scan schema with a "revised" (provided or reader) schema, adjusting the scan schema accordingly. Maps are expanded recursively. Other columns must match types (concrete columns) or the type must match projection (for dynamic columns.)
  • Resolves a provided schema against the projection list. The provided schema can be strict (converts a wildcard into an explicit projection) or lenient (the reader can add additional columns to a wildcard.)
  • Resolves an early reader schema against the projection list and optional provided schema.
  • Resolves a reader output schema against a dynamic (projection list), concreted (provided or prior reader) schema) or combination.

In practice, the logic is simpler: given a schema (dynamic, concrete or combination), further resolve the schema using the input schema provided. Resolve dynamic columns, verify consistency of concrete columns.

Projected columns start as dynamic (no type). Columns are resolved to a known type as a schema identifies that type. Subsequent schemas are obligated to use that same type to avoid an inconsistent schema change downstream.

Expands columns by comparing the existing scan schema with a "revised" (provided or reader) schema, adjusting the scan schema accordingly. Maps are expanded recursively. Other columns must match types (concrete columns) or the type must match projection (for dynamic columns.)

A "resolved" projection list is a list of concrete columns: table columns, nulls, file metadata or partition metadata. An unresolved list has either table column names, but no match, or a wildcard column.

The idea is that the projection list moves through stages of resolution depending on which information is available. An "early schema" table provides schema information up front, and so allows fully resolving the projection list on table open. A "late schema" table allows only a partially resolved projection list, with the remainder of resolution happening on the first (or perhaps every) batch.