org.apache.drill.exec.physical.impl.scan.project.SchemaSmoother

public class SchemaSmoother extends Object

Implements a "schema smoothing" algorithm. Schema persistence for the wildcard selection (i.e. SELECT *)

Constraints:

Adding columns causes a hard schema change.
Removing columns is allowed, uses type from previous schema, as long as previous mode was nullable or repeated.
Changing type or mode causes a hard schema change.
Changing column order is fine; use order from previous schema.

This can all be boiled down to a simpler rule:

Schema persistence is possible if the output schema from a prior schema can be reused for the current schema.
Else, a hard schema change occurs and a new output schema is derived from the new table schema.

The core idea here is to "unresolve" a fully-resolved table schema to produce a new projection list that is the equivalent of using that prior projection list in the SELECT. Then, keep that projection list only if it is compatible with the next table schema, else throw it away and start over from the actual scan projection list.

Algorithm:

If partitions are included in the wildcard, and the new file needs more than the current one, create a new schema.
Else, treat partitions as select, fill in missing with nulls.
From an output schema, construct a new select list specification as though the columns in the current schema were explicitly specified in the SELECT clause.
For each new schema column, verify that the column exists in the generated SELECT clause and is of the same type. If not, create a new schema.
Use the generated schema to plan a new projection from the new schema to the prior schema.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

SchemaSmoother.IncompatibleSchemaException

Exception thrown if the prior schema is not compatible with the new table schema.
Constructor Summary

Constructors

Constructor

Description

SchemaSmoother(ScanLevelProjection scanProj, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers)
Method Summary

Modifier and Type

Method

Description

ReaderLevelProjection

resolve(TupleMetadata tableSchema, ResolvedTuple outputTuple)

int

schemaVersion()

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- SchemaSmoother
  
  public SchemaSmoother(ScanLevelProjection scanProj, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers)
Method Details
- resolve
  
  public ReaderLevelProjection resolve(TupleMetadata tableSchema, ResolvedTuple outputTuple)
- schemaVersion
  
  public int schemaVersion()

Class SchemaSmoother

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

SchemaSmoother

Method Details

resolve

schemaVersion