java.lang.Object

org.apache.drill.metastore.util.SchemaPathUtils

public class SchemaPathUtils extends Object

Method Summary

Modifier and Type

Method

Description

static void

addColumnMetadata(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath,TypeProtos.MajorType> types)

Adds column with specified schema path and type into specified TupleMetadata schema.

static ColumnMetadata

getColumnMetadata(SchemaPath schemaPath, TupleMetadata schema)

Returns ColumnMetadata instance obtained from specified TupleMetadata schema which corresponds to the specified column schema path.

static boolean

isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema)

Checks if field identified by the schema path is child in either DICT or REPEATED MAP.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- getColumnMetadata
  
  public static ColumnMetadata getColumnMetadata(SchemaPath schemaPath, TupleMetadata schema)
  
  Returns ColumnMetadata instance obtained from specified TupleMetadata schema which corresponds to the specified column schema path.
  
  Parameters:
  
  schemaPath - schema path of the column which should be obtained
  
  schema - tuple schema where column should be searched
  
  Returns:
  
  ColumnMetadata instance which corresponds to the specified column schema path
- isFieldNestedInDictOrRepeatedMap
  
  public static boolean isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema)
  
  Checks if field identified by the schema path is child in either DICT or REPEATED MAP. For such fields, nested in DICT or REPEATED MAP, filters can't be removed based on Parquet statistics.
  The need for the check arises because statistics data is not obtained for such fields as their representation differs from the 'canonical' one. For example, field `a` in Parquet's STRUCT ARRAY is represented as `struct_array`.`bag`.`array_element`.`a` but once it is used in a filter, ... WHERE struct_array[0].a = 1, it has different representation (with indexes stripped): `struct_array`.`a` which is not present in statistics. The same happens with DICT's value: for SELECT ... WHERE dict_col['a'] = 0, statistics exist for `dict_col`.`key_value`.`value` but the field in filter is translated to `dict_col`.`a` and hence it is considered not present in statistics. If the fields (such as ones shown in examples) are OPTIONAL INT then the field is considered not present in a table and is treated as NULL. To avoid this situation, the method is used.
  
  Parameters:
  
  schemaPath - schema path used in filter
  
  schema - schema containing all the fields in the file
  
  Returns:
  
  true if field is nested inside DICT (is `key` or `value`) or inside REPEATED MAP field, false otherwise.
- addColumnMetadata
  
  public static void addColumnMetadata(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath,TypeProtos.MajorType> types)
  
  Adds column with specified schema path and type into specified TupleMetadata schema. For the case when specified SchemaPath has children, corresponding maps will be created in the TupleMetadata schema and the last child of the map will have specified type.
  
  Parameters:
  
  schema - tuple schema where column should be added
  
  schemaPath - schema path of the column which should be added
  
  type - type of the column which should be added
  
  types - list of column's parent types

Class SchemaPathUtils

Method Summary

Methods inherited from class java.lang.Object

Method Details

getColumnMetadata

isFieldNestedInDictOrRepeatedMap

addColumnMetadata