Class SchemaTracker


public class SchemaTracker extends Object
Tracks changes to schemas via "snapshots" over time. That is, given a schema, tracks if a new schema is the same as the current one. For example, each batch output from a series of readers might be compared, as they are returned, to detect schema changes from one batch to the next. This class does not track vector-by-vector changes as a schema is built, but rather periodic "snapshots" at times determined by the operator.

If an operator is guaranteed to emit a consistent schema, then no checks need be done, and this tracker will report no schema change. On the other hand, a scanner might check schema more often. At least once per reader, and more often if a reader is "late-schema": if the reader can change schema batch-by-batch.

Drill defines "schema change" in a very specific way. Not only must the set of columns be the same, and have the same types, it must also be the case that the vectors that hold the columns be identical. Generated code contains references to specific vector objects; passing along different vectors requires new code to be generated and is treated as a schema change.

Drill has no concept of "same schema, different vectors." A change in vector is just as serious as a change in schema. Hence, operators try to use the same vectors for their entire lives. That is the change tracked here.

Schema versions start at 1. A schema version of 0 means that no output batch was ever presented.

  • Constructor Details

    • SchemaTracker

      public SchemaTracker()
  • Method Details

    • trackSchema

      public void trackSchema(VectorContainer newBatch)
    • schemaVersion

      public int schemaVersion()
    • schema

      public BatchSchema schema()