Class ReaderLevelProjection

java.lang.Object
org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
Direct Known Subclasses:
ExplicitSchemaProjection, SmoothingProjection, WildcardProjection, WildcardSchemaProjection

public class ReaderLevelProjection extends Object
Computes the full output schema given a table (or batch) schema. Takes the original, unresolved output list from the projection definition, merges it with the file, directory and table schema information, and produces a partially or fully resolved output list.

A "resolved" projection list is a list of concrete columns: table columns, nulls, file metadata or partition metadata. An unresolved list has either table column names, but no match, or a wildcard column.

The idea is that the projection list moves through stages of resolution depending on which information is available. An "early schema" table provides schema information up front, and so allows fully resolving the projection list on table open. A "late schema" table allows only a partially resolved projection list, with the remainder of resolution happening on the first (or perhaps every) batch.

Data source (table) schema can be of two forms:

  • Early schema: the schema is known before reading data. A JDBC data source is an example, as is a CSV reader for a file with headers.
  • Late schema: the schema is not known until data is read, and is discovered on the fly. Example: JSON, which declares values as maps without an up-front schema.
These two forms give rise to distinct ways of planning the projection.

The final result of the projection is a set of "output" columns: a set of columns that, taken together, defines the row (bundle of vectors) that the scan operator produces. Columns are ordered: the order specified here must match the order that columns appear in the result set loader and the vector container so that code can access columns by index as well as name.