Class SchemaSmoother


public class SchemaSmoother extends Object
Implements a "schema smoothing" algorithm. Schema persistence for the wildcard selection (i.e. SELECT *)


  • Adding columns causes a hard schema change.
  • Removing columns is allowed, uses type from previous schema, as long as previous mode was nullable or repeated.
  • Changing type or mode causes a hard schema change.
  • Changing column order is fine; use order from previous schema.
This can all be boiled down to a simpler rule:
  • Schema persistence is possible if the output schema from a prior schema can be reused for the current schema.
  • Else, a hard schema change occurs and a new output schema is derived from the new table schema.
The core idea here is to "unresolve" a fully-resolved table schema to produce a new projection list that is the equivalent of using that prior projection list in the SELECT. Then, keep that projection list only if it is compatible with the next table schema, else throw it away and start over from the actual scan projection list.


  • If partitions are included in the wildcard, and the new file needs more than the current one, create a new schema.
  • Else, treat partitions as select, fill in missing with nulls.
  • From an output schema, construct a new select list specification as though the columns in the current schema were explicitly specified in the SELECT clause.
  • For each new schema column, verify that the column exists in the generated SELECT clause and is of the same type. If not, create a new schema.
  • Use the generated schema to plan a new projection from the new schema to the prior schema.