Class MissingColumnHandlerBuilder
The set of missing columns may differ per reader or even per batch within a reader. If reader 1 reads all columns, but reader 2 reads a subset, then this class will use the column types from reader 1 when creating the columns missing from reader 2.
Unfortunately, Drill cannot predict the future, so the opposite scenario will end badly: Reader 2 comes first, omits column "c", this class chooses a default value, then Reader 1 wants the column to be some other type. The query will fail with a type mismatch error.
Specifically, the mechanism uses the following rules to infer column type:
- For resolved columns (those with a type), use that type. If the type is non-nullable, fill in a default value (generally 0 or blank.) A column is resolved if given by a defined schema, a provided schema or a prior reader.
- For unresolved columns (those without a type), use the default type configured in this builder. If no type is provied, use a "default default" of Nullable INT, Drill's classic choice.
Note that Drill is not magic: relying on the default type is likely to cause a type conflict across readers or across scans. A default has no way of knowing if it matches the same column read in some other fragment on some other node.
Work is separated in a schema-time part (to resolve column types) and a read-time part (to create and fill the needed vectors.)
Caveats
The project mechanism handles nested "missing" columns as mentioned above. This works to create null columns within maps that are defined by the data source. However, the mechanism does not currently handle creating null columns within repeated maps or lists. Doing so is possible, but requires adding a level of cardinality computation to create the proper number of "inner" values.-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
static final TypeProtos.MajorType
protected TupleMetadata
protected TypeProtos.MajorType
protected TupleMetadata
protected ResultVectorCache
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionallowRequiredNullColumns
(boolean flag) build()
inputSchema
(TupleMetadata inputSchema) nullType
(TypeProtos.MajorType nullType) vectorCache
(ResultVectorCache vectorCache)
-
Field Details
-
DEFAULT_NULL_TYPE
-
inputSchema
-
nullType
-
allowRequiredNullColumns
protected boolean allowRequiredNullColumns -
vectorCache
-
outputSchema
-
-
Constructor Details
-
MissingColumnHandlerBuilder
public MissingColumnHandlerBuilder()
-
-
Method Details
-
inputSchema
-
nullType
-
allowRequiredNullColumns
-
vectorCache
-
buildSchema
-
build
-