Class MaterializedField

java.lang.Object
org.apache.drill.exec.record.MaterializedField

public class MaterializedField extends Object
Meta-data description of a column characterized by a name and a type (including both data type and cardinality AKA mode). For map types, the description includes the nested columns.)
  • Method Details

    • create

      public static MaterializedField create(UserBitShared.SerializedField serField)
    • getSerializedField

      public UserBitShared.SerializedField getSerializedField()
      Create and return a serialized field based on the current state.
    • getAsBuilder

    • getChildren

      public Collection<MaterializedField> getChildren()
    • newWithChild

      public MaterializedField newWithChild(MaterializedField child)
    • addChild

      public void addChild(MaterializedField field)
    • removeChild

      public void removeChild(MaterializedField field)
    • replaceType

      public void replaceType(TypeProtos.MajorType newType)
      Replace the type with a new one that has the same minor type and mode, but with perhaps different details.

      The type is immutable. But, it contains subtypes, used or lists and unions. To add a subtype, we must create a whole new major type.

      It appears that the MaterializedField class was also meant to be immutable. But, it holds the children for a map, and contains methods to add children. So, it is not immutable.

      This method allows evolving a list or union without the need to create a new MaterializedField. Doing so is problematic for nested maps because the map (or list, or union) holds onto the MaterializedField's of its children. There is no way for an inner map to reach out and change the child of its parent.

      By allowing the non-critical metadata to change, we preserve the child relationships as a list or union evolves.

      Parameters:
      newType -
    • clone

      public MaterializedField clone()
      Overrides:
      clone in class Object
    • cloneEmpty

      public MaterializedField cloneEmpty()
    • withType

      public MaterializedField withType(TypeProtos.MajorType type)
    • withPath

      public MaterializedField withPath(String name)
    • withPathAndType

      public MaterializedField withPathAndType(String name, TypeProtos.MajorType type)
    • matches

      public boolean matches(UserBitShared.SerializedField field)
    • create

      public static MaterializedField create(String name, TypeProtos.MajorType type)
    • getName

      public String getName()
    • getWidth

      public int getWidth()
    • getType

      public TypeProtos.MajorType getType()
    • getScale

      public int getScale()
    • getPrecision

      public int getPrecision()
    • isNullable

      public boolean isNullable()
    • getDataMode

      public TypeProtos.DataMode getDataMode()
    • getChildCount

      public int getChildCount()
    • getOtherNullableVersion

      public MaterializedField getOtherNullableVersion()
    • getValueClass

      public Class<?> getValueClass()
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object obj)
      Equals method doesn't check for the children list of fields here. When a batch is sent over network then it is serialized along with the Materialized Field which also contains information about the internal vectors like offset and bits. While deserializing, these vectors are treated as children of parent vector. If a operator on receiver side like Sort receives a schema in buildSchema phase and then later on receives another batch, that will result in schema change and query will fail. This is because second batch schema will contain information about internal vectors like offset and bits which will not be present in first batch schema. For ref: See TestSort#testSortWithRepeatedMapWithExchanges
      Overrides:
      equals in class Object
      Parameters:
      obj - the other materialized field
      Returns:
      true if the types are equal
    • isEquivalent

      public boolean isEquivalent(MaterializedField other)
      Determine if one column is logically equivalent to another. This is a tricky issue. The rules here:
      • The other schema is assumed to be non-null (unlike equals()).
      • Names must be identical, ignoring case. (Drill, like SQL, is case insensitive.)
      • Type, mode, precision and scale must be identical.
      • Child columns are ignored unless the type is a map. That is, the hidden "$bits" and "$offsets" vector columns are not compared, as one schema may be an "original" (without these hidden columns) while the other may come from a vector (which has the hidden columns added. The standard equals() comparison does consider hidden columns.
      • For maps, the child columns are compared recursively. This version requires that the two sets of columns appear in the same order. (It assumes it is being used in a context where column indexes make sense.) Operators that want to reconcile two maps that differ only in column order need a different comparison.
        Note: Materialized Field and ValueVector has 1:1 mapping which means for each ValueVector there is a materialized field associated with it. So when we replace or add a ValueVector in a VectorContainer then we create new Materialized Field object for the new vector. This works fine for Primitive type ValueVectors but for ValueVector which are of type AbstractContainerVector there is some differences on how Materialized field and ValueVector objects are updated inside the container which both ValueVector and Materialized Field object both mutable.

        For example: For cases of MapVector it can so happen that only the children field type changed but the parent Map type and name remained same. In these cases we replace the children field ValueVector from parent MapVector inside main batch container, with new type of vector. Thus the reference of parent MaprVector inside batch container remains same but the reference of children field ValueVector stored inside MapVector get's updated. During this update it also replaces the Materialized field for that children field which is stored in childrens list of the parent MapVector Materialized Field. Since the children list of parent Materialized Field is updated, this make this class mutable. Hence there should not be any check for object reference equality here but instead there should be deep comparison which is what this method is now performing. Since if we have object reference check then in above cases it will return true for 2 Materialized Field object whose children field list is different which is not correct. Same holds true for isEquivalent(MaterializedField) method.

      Parameters:
      other - another field
      Returns:
      true if the columns are identical according to the above rules, false if they differ
    • isPromotableTo

      public boolean isPromotableTo(MaterializedField other, boolean allowModeChange)
      Determine if the present column schema can be promoted to the given schema. Promotion is possible if the schemas are equivalent, or if required mode is promoted to nullable, or if scale or precision can be increased.
      Parameters:
      other - the field to which this one is to be promoted
      Returns:
      true if promotion is possible, false otherwise
    • toString

      public String toString(boolean includeChildren)

      Creates materialized field string representation. Includes field name, its type with precision and scale if any and data mode. Nested fields if any are included. Number of nested fields to include is limited to 10.

      FIELD_NAME(TYPE(PRECISION,SCALE):DATA_MODE)[NESTED_FIELD_1, NESTED_FIELD_2]

      Example: ok(BIT:REQUIRED), col(VARCHAR(3):OPTIONAL), emp_id(DECIMAL28SPARSE(6,0):REQUIRED)

      Returns:
      materialized field string representation
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • format

      public String format()
    • format

      public void format(StringBuilder builder, int level)
      Format the field in a multi-line format, with children (but not subtypes) indented. Useful for wide rows where the single-line format is too hard to read.
    • hasSameTypeAndMode

      public boolean hasSameTypeAndMode(MaterializedField that)
      Return true if two fields have identical MinorType and Mode.