public class SchemaPathUtils extends Object
Modifier and Type | Method and Description |
---|---|
static void |
addColumnMetadata(TupleMetadata schema,
SchemaPath schemaPath,
TypeProtos.MajorType type,
Map<SchemaPath,TypeProtos.MajorType> types)
Adds column with specified schema path and type into specified
TupleMetadata schema . |
static ColumnMetadata |
getColumnMetadata(SchemaPath schemaPath,
TupleMetadata schema)
Returns
ColumnMetadata instance obtained from specified TupleMetadata schema which corresponds to
the specified column schema path. |
static boolean |
isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath,
TupleMetadata schema)
Checks if field identified by the schema path is child in either
DICT or REPEATED MAP . |
public static ColumnMetadata getColumnMetadata(SchemaPath schemaPath, TupleMetadata schema)
ColumnMetadata
instance obtained from specified TupleMetadata schema
which corresponds to
the specified column schema path.schemaPath
- schema path of the column which should be obtainedschema
- tuple schema where column should be searchedColumnMetadata
instance which corresponds to the specified column schema pathpublic static boolean isFieldNestedInDictOrRepeatedMap(SchemaPath schemaPath, TupleMetadata schema)
DICT
or REPEATED MAP
.
For such fields, nested in DICT
or REPEATED MAP
,
filters can't be removed based on Parquet statistics.
The need for the check arises because statistics data is not obtained for such fields as their representation
differs from the 'canonical' one. For example, field `a`
in Parquet's STRUCT ARRAY
is represented
as `struct_array`.`bag`.`array_element`.`a`
but once it is used in a filter, ... WHERE struct_array[0].a = 1
,
it has different representation (with indexes stripped): `struct_array`.`a`
which is not present in statistics.
The same happens with DICT's value
: for SELECT ... WHERE dict_col['a'] = 0
, statistics exist for
`dict_col`.`key_value`.`value`
but the field in filter is translated to `dict_col`.`a`
and hence it is
considered not present in statistics. If the fields (such as ones shown in examples) are OPTIONAL INT
then
the field is considered not present in a table and is treated as NULL
. To avoid this situation, the method is used.
schemaPath
- schema path used in filterschema
- schema containing all the fields in the fileDICT
(is `key`
or `value`
)
or inside REPEATED MAP
field, false otherwise.public static void addColumnMetadata(TupleMetadata schema, SchemaPath schemaPath, TypeProtos.MajorType type, Map<SchemaPath,TypeProtos.MajorType> types)
TupleMetadata schema
.
For the case when specified SchemaPath
has children, corresponding maps will be created
in the TupleMetadata schema
and the last child of the map will have specified type.schema
- tuple schema where column should be addedschemaPath
- schema path of the column which should be addedtype
- type of the column which should be addedtypes
- list of column's parent typesCopyright © 1970 The Apache Software Foundation. All rights reserved.