org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection

org.apache.drill.exec.physical.impl.scan.project.SmoothingProjection

public class SmoothingProjection extends ReaderLevelProjection

Resolve a table schema against the prior schema. This works only if the types match and if all columns in the table schema already appear in the prior schema.

Consider this an experimental mechanism. The hope was that, with clever techniques, we could "smooth over" some of the issues that cause schema change events in Drill. As it turned out, however, creating this mechanism revealed that it is not possible, even in theory, to handle most schema changes because of the time dimension:

An even in a later batch may provide information that would have caused us to make a different decision in an earlier batch. For example, we are asked for column `foo`, did not see such a column in the first batch, block or file, guessed some type, and later saw that the column was of a different type. We can't "time travel" to tell our earlier selves, nor, when we make the initial type decision, can we jump to the future to see what type we'll discover.
Readers in this fragment may see column `foo` but readers in another fragment read files/blocks that don't have that column. The two readers cannot communicate to agree on a type.

What this mechanism can do is make decisions based on history: when a column appears, we can adjust its type a bit to try to avoid an unnecessary change. For example, if a prior file in this scan saw `foo` as nullable Varchar, but the present file has the column as requied Varchar, we can use the more general nullable form. But, again, the "can't predict the future" bites us: we can handle a nullable-to-required column change, but not visa-versa.

What this mechanism will tell the careful reader is that the only general solution to the schema-change problem is to now the full schema up front: for the planner to be told the schema and to communicate that schema to all readers so that all readers agree on the final schema.

When that is done, the techniques shown here can be used to adjust any per-file variation of schema to match the up-front schema.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
ReaderLevelProjection.ReaderProjectionResolver
Field Summary

Fields

Modifier and Type

Field

Description

protected final List<MaterializedField>

rewrittenFields

Fields inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
resolvers
Constructor Summary

Constructors

Constructor

Description

SmoothingProjection(ScanLevelProjection scanProj, TupleMetadata tableSchema, ResolvedTuple priorSchema, ResolvedTuple outputTuple, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers)
Method Summary

Modifier and Type

Method

Description

List<MaterializedField>

revisedTableSchema()

Methods inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
resolveSpecial

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- rewrittenFields
  
  protected final List<MaterializedField> rewrittenFields
Constructor Details
- SmoothingProjection
  
  public SmoothingProjection(ScanLevelProjection scanProj, TupleMetadata tableSchema, ResolvedTuple priorSchema, ResolvedTuple outputTuple, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers) throws SchemaSmoother.IncompatibleSchemaException
  
  Throws:
  
  SchemaSmoother.IncompatibleSchemaException
Method Details
- revisedTableSchema
  
  public List<MaterializedField> revisedTableSchema()

Class SmoothingProjection

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection

Field Summary

Fields inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection

Methods inherited from class java.lang.Object

Field Details

rewrittenFields

Constructor Details

SmoothingProjection

Method Details

revisedTableSchema