Class SmoothingProjection
Consider this an experimental mechanism. The hope was that, with clever techniques, we could "smooth over" some of the issues that cause schema change events in Drill. As it turned out, however, creating this mechanism revealed that it is not possible, even in theory, to handle most schema changes because of the time dimension:
- An even in a later batch may provide information that would have caused us to make a different decision in an earlier batch. For example, we are asked for column `foo`, did not see such a column in the first batch, block or file, guessed some type, and later saw that the column was of a different type. We can't "time travel" to tell our earlier selves, nor, when we make the initial type decision, can we jump to the future to see what type we'll discover.
- Readers in this fragment may see column `foo` but readers in another fragment read files/blocks that don't have that column. The two readers cannot communicate to agree on a type.
What this mechanism can do is make decisions based on history: when a column appears, we can adjust its type a bit to try to avoid an unnecessary change. For example, if a prior file in this scan saw `foo` as nullable Varchar, but the present file has the column as requied Varchar, we can use the more general nullable form. But, again, the "can't predict the future" bites us: we can handle a nullable-to-required column change, but not visa-versa.
What this mechanism will tell the careful reader is that the only general solution to the schema-change problem is to now the full schema up front: for the planner to be told the schema and to communicate that schema to all readers so that all readers agree on the final schema.
When that is done, the techniques shown here can be used to adjust any per-file variation of schema to match the up-front schema.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
ReaderLevelProjection.ReaderProjectionResolver
-
Field Summary
Fields inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
resolvers
-
Constructor Summary
ConstructorDescriptionSmoothingProjection
(ScanLevelProjection scanProj, TupleMetadata tableSchema, ResolvedTuple priorSchema, ResolvedTuple outputTuple, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers) -
Method Summary
Methods inherited from class org.apache.drill.exec.physical.impl.scan.project.ReaderLevelProjection
resolveSpecial
-
Field Details
-
rewrittenFields
-
-
Constructor Details
-
SmoothingProjection
public SmoothingProjection(ScanLevelProjection scanProj, TupleMetadata tableSchema, ResolvedTuple priorSchema, ResolvedTuple outputTuple, List<ReaderLevelProjection.ReaderProjectionResolver> resolvers) throws SchemaSmoother.IncompatibleSchemaException
-
-
Method Details
-
revisedTableSchema
-