Class MissingColumnHandlerBuilder

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.MissingColumnHandlerBuilder

public class MissingColumnHandlerBuilder extends Object
Builds the handler which provides values for columns in an explicit project list but for which the reader provides no values. Obtains types from a defined or provided schema, or using a configurable default type. Fills in null values, or a default value configured in a provided schema.

The set of missing columns may differ per reader or even per batch within a reader. If reader 1 reads all columns, but reader 2 reads a subset, then this class will use the column types from reader 1 when creating the columns missing from reader 2.

Unfortunately, Drill cannot predict the future, so the opposite scenario will end badly: Reader 2 comes first, omits column "c", this class chooses a default value, then Reader 1 wants the column to be some other type. The query will fail with a type mismatch error.

Specifically, the mechanism uses the following rules to infer column type:

  • For resolved columns (those with a type), use that type. If the type is non-nullable, fill in a default value (generally 0 or blank.) A column is resolved if given by a defined schema, a provided schema or a prior reader.
  • For unresolved columns (those without a type), use the default type configured in this builder. If no type is provied, use a "default default" of Nullable INT, Drill's classic choice.
    • Note that Drill is not magic: relying on the default type is likely to cause a type conflict across readers or across scans. A default has no way of knowing if it matches the same column read in some other fragment on some other node.

      Work is separated in a schema-time part (to resolve column types) and a read-time part (to create and fill the needed vectors.)

      Caveats

      The project mechanism handles nested "missing" columns as mentioned above. This works to create null columns within maps that are defined by the data source. However, the mechanism does not currently handle creating null columns within repeated maps or lists. Doing so is possible, but requires adding a level of cardinality computation to create the proper number of "inner" values.