Class SchemaPathUtils

java.lang.Object
org.apache.drill.metastore.util.SchemaPathUtils

public class SchemaPathUtils extends Object
  • Method Details

    • getColumnMetadata

      public static ColumnMetadata getColumnMetadata(SchemaPath schemaPath, TupleMetadata schema)
      Returns ColumnMetadata instance obtained from specified TupleMetadata schema which corresponds to the specified column schema path.
      Parameters:
      schemaPath - schema path of the column which should be obtained
      schema - tuple schema where column should be searched
      Returns:
      ColumnMetadata instance which corresponds to the specified column schema path
    • isFieldNestedInDictOrRepeatedMap

      public static boolean isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema)
      Checks if field identified by the schema path is child in either DICT or REPEATED MAP. For such fields, nested in DICT or REPEATED MAP, filters can't be removed based on Parquet statistics.

      The need for the check arises because statistics data is not obtained for such fields as their representation differs from the 'canonical' one. For example, field `a` in Parquet's STRUCT ARRAY is represented as `struct_array`.`bag`.`array_element`.`a` but once it is used in a filter, ... WHERE struct_array[0].a = 1, it has different representation (with indexes stripped): `struct_array`.`a` which is not present in statistics. The same happens with DICT's value: for SELECT ... WHERE dict_col['a'] = 0, statistics exist for `dict_col`.`key_value`.`value` but the field in filter is translated to `dict_col`.`a` and hence it is considered not present in statistics. If the fields (such as ones shown in examples) are OPTIONAL INT then the field is considered not present in a table and is treated as NULL. To avoid this situation, the method is used.

      Parameters:
      schemaPath - schema path used in filter
      schema - schema containing all the fields in the file
      Returns:
      true if field is nested inside DICT (is `key` or `value`) or inside REPEATED MAP field, false otherwise.
    • addColumnMetadata

      public static void addColumnMetadata(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath,TypeProtos.MajorType> types)
      Adds column with specified schema path and type into specified TupleMetadata schema. For the case when specified SchemaPath has children, corresponding maps will be created in the TupleMetadata schema and the last child of the map will have specified type.
      Parameters:
      schema - tuple schema where column should be added
      schemaPath - schema path of the column which should be added
      type - type of the column which should be added
      types - list of column's parent types