Class ParquetTableMetadataUtils
java.lang.Object
org.apache.drill.exec.store.parquet.ParquetTableMetadataUtils
Utility class for converting parquet metadata classes to Metastore metadata classes.
-
Method Summary
Modifier and TypeMethodDescriptionstatic Map<SchemaPath,
ColumnStatistics<?>> addImplicitColumnsStatistics
(Map<SchemaPath, ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns) Creates new map based on specifiedcolumnStatistics
with added statistics for implicit and partition (dir) columns.static Map<SchemaPath,
ColumnStatistics<?>> getColumnStatistics
(TupleMetadata schema, DrillStatsTable statistics) Returns map with schema path andColumnStatistics
obtained from specifiedDrillStatsTable
for all columns from specifiedBaseTableMetadata
.static Map<SchemaPath,
TypeProtos.MajorType> getFileFields
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file) Returns map of column names with their drill types for specifiedfile
.static FileMetadata
getFileMetadata
(Collection<RowGroupMetadata> rowGroups) ReturnsFileMetadata
instance received by merging specifiedRowGroupMetadata
list.static Map<SchemaPath,
TypeProtos.MajorType> getIntermediateFields
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their Drill types for everyNameSegment
inSchemaPath
in specifiedrowGroup
.getNonInterestingColumnsMeta
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata) Returns the non-interesting column's metadatastatic org.apache.parquet.schema.OriginalType
getOriginalType
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsOriginalType
type for the specified column.static PartitionMetadata
getPartitionMetadata
(SchemaPath partitionColumn, List<FileMetadata> files) ReturnsPartitionMetadata
instance received by merging specifiedFileMetadata
list.static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName
getPrimitiveTypeName
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsPrimitiveType.PrimitiveTypeName
type for the specified column.static Map<SchemaPath,
ColumnStatistics<?>> getRowGroupColumnStatistics
(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata) Converts specifiedMetadataBase.RowGroupMetadata
into the map ofColumnStatistics
instances with column names as keys.static Map<SchemaPath,
TypeProtos.MajorType> getRowGroupFields
(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their drill types for specifiedrowGroup
.static RowGroupMetadata
getRowGroupMetadata
(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location) ReturnsRowGroupMetadata
instance converted from specified parquetrowGroupMetadata
.static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,
RowGroupMetadata> getRowGroupsMetadata
(MetadataBase.ParquetTableMetadataBase tableMetadata) Returns list ofRowGroupMetadata
received by converting parquet row groups metadata taken from the specified tableMetadata.static Object
getValue
(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType) Handles passed value considering its type and specifiedprimitiveType
withoriginalType
.
-
Method Details
-
addImplicitColumnsStatistics
public static Map<SchemaPath,ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath, ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns) Creates new map based on specifiedcolumnStatistics
with added statistics for implicit and partition (dir) columns.- Parameters:
columnsStatistics
- map of column statistics to expandcolumns
- list of all columns including implicit or partition onespartitionValues
- list of partition valuesoptionManager
- option managerlocation
- location of metadata partsupportsFileImplicitColumns
- whether implicit columns are supported- Returns:
- map with added statistics for implicit and partition (dir) columns
-
getRowGroupsMetadata
public static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata) Returns list ofRowGroupMetadata
received by converting parquet row groups metadata taken from the specified tableMetadata. Assigns index to row groups based on their position in files metadata. For empty / fake row groups assigns '-1' index.- Parameters:
tableMetadata
- the source of row groups to be converted- Returns:
- list of
RowGroupMetadata
-
getRowGroupMetadata
public static RowGroupMetadata getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location) ReturnsRowGroupMetadata
instance converted from specified parquetrowGroupMetadata
.- Parameters:
tableMetadata
- table metadata which contains row group metadata to convertrowGroupMetadata
- row group metadata to convertrgIndexInFile
- index of current row group within the filelocation
- location of file with current row group- Returns:
RowGroupMetadata
instance converted from specified parquetrowGroupMetadata
-
getFileMetadata
ReturnsFileMetadata
instance received by merging specifiedRowGroupMetadata
list.- Parameters:
rowGroups
- collection ofRowGroupMetadata
to be merged- Returns:
FileMetadata
instance
-
getPartitionMetadata
public static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files) ReturnsPartitionMetadata
instance received by merging specifiedFileMetadata
list.- Parameters:
partitionColumn
- partition columnfiles
- list of files to be merged- Returns:
PartitionMetadata
instance
-
getRowGroupColumnStatistics
public static Map<SchemaPath,ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata) Converts specifiedMetadataBase.RowGroupMetadata
into the map ofColumnStatistics
instances with column names as keys.- Parameters:
tableMetadata
- the source of column typesrowGroupMetadata
- metadata to convert- Returns:
- map with converted row group metadata
-
getNonInterestingColumnsMeta
public static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata) Returns the non-interesting column's metadata- Parameters:
parquetTableMetadata
- the source of column metadata for non-interesting column's statistics- Returns:
- returns non-interesting columns metadata
-
getValue
public static Object getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType) Handles passed value considering its type and specifiedprimitiveType
withoriginalType
.- Parameters:
value
- value to handleprimitiveType
- primitive type of the column whose value should be handledoriginalType
- original type of the column whose value should be handled- Returns:
- handled value
-
getFileFields
public static Map<SchemaPath,TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file) Returns map of column names with their drill types for specifiedfile
.- Parameters:
parquetTableMetadata
- the source of primitive and original column typesfile
- file whose columns should be discovered- Returns:
- map of column names with their drill types
-
getRowGroupFields
public static Map<SchemaPath,TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their drill types for specifiedrowGroup
.- Parameters:
parquetTableMetadata
- the source of primitive and original column typesrowGroup
- row group whose columns should be discovered- Returns:
- map of column names with their drill types
-
getIntermediateFields
public static Map<SchemaPath,TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup) Returns map of column names with their Drill types for everyNameSegment
inSchemaPath
in specifiedrowGroup
. The type for aSchemaPath
can benull
in case when it is not possible to determine its type. Actually, as of now this hierarchy is of interest solely because there is a need to account forTypeProtos.MinorType.DICT
to make sure filters used onDICT
's values (get by key) are not pruned out before actual filtering happens.- Parameters:
parquetTableMetadata
- the source of column typesrowGroup
- row group whose columns should be discovered- Returns:
- map of column names with their drill types
-
getOriginalType
public static org.apache.parquet.schema.OriginalType getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsOriginalType
type for the specified column.- Parameters:
parquetTableMetadata
- the source of column typecolumn
- column whoseOriginalType
should be returned- Returns:
OriginalType
type for the specified column
-
getPrimitiveTypeName
public static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column) ReturnsPrimitiveType.PrimitiveTypeName
type for the specified column.- Parameters:
parquetTableMetadata
- the source of column typecolumn
- column whosePrimitiveType.PrimitiveTypeName
should be returned- Returns:
PrimitiveType.PrimitiveTypeName
type for the specified column
-
getColumnStatistics
public static Map<SchemaPath,ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics) Returns map with schema path andColumnStatistics
obtained from specifiedDrillStatsTable
for all columns from specifiedBaseTableMetadata
.- Parameters:
schema
- source of column namesstatistics
- source of column statistics- Returns:
- map with schema path and
ColumnStatistics
-