Class BatchSchema

java.lang.Object
org.apache.drill.exec.record.BatchSchema
All Implemented Interfaces:
Iterable<MaterializedField>

public class BatchSchema extends Object implements Iterable<MaterializedField>
Historically BatchSchema is used to represent the schema of a batch. However, it does not handle complex types well. If you have a choice, use TupleMetadata instead.
  • Constructor Details

  • Method Details

    • newBuilder

      public static SchemaBuilder newBuilder()
    • getFieldCount

      public int getFieldCount()
    • getColumn

      public MaterializedField getColumn(int index)
    • iterator

      public Iterator<MaterializedField> iterator()
      Specified by:
      iterator in interface Iterable<MaterializedField>
    • getSelectionVectorMode

      public BatchSchema.SelectionVectorMode getSelectionVectorMode()
    • clone

      public BatchSchema clone()
      Overrides:
      clone in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object obj)
      DRILL-5525: the semantics of this method are badly broken. Caveat emptor. This check used for detecting actual schema change inside operator record batch will not work for AbstractContainerVectors (like MapVector). In each record batch a reference to incoming batch schema is stored (let say S:{a: int}) and then equals is called on that stored reference and current incoming batch schema. Internally schema object has references to Materialized fields from vectors in container. If there is change in incoming batch schema, then the upstream will create a new ValueVector in its output container with the new detected type, which in turn will have new instance for Materialized Field. Then later a new BatchSchema object is created for this new incoming batch (let say S":{a":varchar}). The operator calling equals will have reference to old schema object (S) and hence first check will not be satisfied and then it will call equals on each of the Materialized Field (a.equals(a")). Since new materialized field is created for newly created vector the equals check on field will return false. And schema change will be detected in this case. Now consider instead of int vector there is a MapVector such that initial schema was (let say S:{a:{b:int, c:int}} and then later schema for Map field c changes, then in container Map vector will be found but later the children vector for field c will be replaced. This new schema object will be created as (S":{a:{b:int, c":varchar}}). Now when S.equals(S") is called it will eventually call a.equals(a) which will return true even though the schema of children value vector c has changed. This is because no new vector is created for field (a) and hence it's object reference to MaterializedField has not changed which will be reflected in both old and new schema instances. Hence we should make use of isEquivalent(BatchSchema) method instead since MaterializedField.isEquivalent(MaterializedField) method is updated to remove the reference check.
      Overrides:
      equals in class Object
    • isEquivalent

      public boolean isEquivalent(BatchSchema other)
      Compare that two schemas are identical according to the rules defined in MaterializedField.isEquivalent(MaterializedField). In particular, this method requires that the fields have a 1:1 ordered correspondence in the two schemas.
      Parameters:
      other - another non-null batch schema
      Returns:
      true if the two schemas are equivalent according to the MaterializedField.isEquivalent(MaterializedField) rules, false otherwise
    • merge

      public BatchSchema merge(BatchSchema otherSchema)
      Merge two schemas to produce a new, merged schema. The caller is responsible for ensuring that column names are unique. The order of the fields in the new schema is the same as that of this schema, with the other schema's fields appended in the order defined in the other schema.

      Merging data with selection vectors is unlikely to be useful, or work well. With a selection vector, the two record batches would have to be correlated both in their selection vectors AND in the underlying vectors. Such a use case is hard to imagine. So, for now, this method forbids merging schemas if either of them carry a selection vector. If we discover a meaningful use case, we can revisit the issue.

      Parameters:
      otherSchema - the schema to merge with this one
      Returns:
      the new, merged, schema
    • format

      public String format()
      Format the schema into a multi-line format. Useful when debugging a query with a very wide schema as the usual single-line format is far too hard to read.