public class ParquetTableMetadataUtils extends Object
Modifier and Type | Method and Description |
---|---|
static Map<SchemaPath,ColumnStatistics<?>> |
addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics,
List<SchemaPath> columns,
List<String> partitionValues,
OptionManager optionManager,
org.apache.hadoop.fs.Path location,
boolean supportsFileImplicitColumns)
Creates new map based on specified
columnStatistics with added statistics
for implicit and partition (dir) columns. |
static Map<SchemaPath,ColumnStatistics<?>> |
getColumnStatistics(TupleMetadata schema,
DrillStatsTable statistics)
Returns map with schema path and
ColumnStatistics obtained from specified DrillStatsTable
for all columns from specified BaseTableMetadata . |
static Map<SchemaPath,TypeProtos.MajorType> |
getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ParquetFileMetadata file)
Returns map of column names with their drill types for specified
file . |
static FileMetadata |
getFileMetadata(Collection<RowGroupMetadata> rowGroups)
Returns
FileMetadata instance received by merging specified RowGroupMetadata list. |
static Map<SchemaPath,TypeProtos.MajorType> |
getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.RowGroupMetadata rowGroup)
Returns map of column names with their Drill types for every
NameSegment in SchemaPath
in specified rowGroup . |
static NonInterestingColumnsMetadata |
getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)
Returns the non-interesting column's metadata
|
static org.apache.parquet.schema.OriginalType |
getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ColumnMetadata column)
Returns
OriginalType type for the specified column. |
static PartitionMetadata |
getPartitionMetadata(SchemaPath partitionColumn,
List<FileMetadata> files)
Returns
PartitionMetadata instance received by merging specified FileMetadata list. |
static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName |
getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.ColumnMetadata column)
Returns
PrimitiveType.PrimitiveTypeName type for the specified column. |
static Map<SchemaPath,ColumnStatistics<?>> |
getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata,
MetadataBase.RowGroupMetadata rowGroupMetadata)
Converts specified
MetadataBase.RowGroupMetadata into the map of ColumnStatistics
instances with column names as keys. |
static Map<SchemaPath,TypeProtos.MajorType> |
getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata,
MetadataBase.RowGroupMetadata rowGroup)
Returns map of column names with their drill types for specified
rowGroup . |
static RowGroupMetadata |
getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata,
MetadataBase.RowGroupMetadata rowGroupMetadata,
int rgIndexInFile,
org.apache.hadoop.fs.Path location)
Returns
RowGroupMetadata instance converted from specified parquet rowGroupMetadata . |
static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> |
getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)
Returns list of
RowGroupMetadata received by converting parquet row groups metadata
taken from the specified tableMetadata. |
static Object |
getValue(Object value,
org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType,
org.apache.parquet.schema.OriginalType originalType)
Handles passed value considering its type and specified
primitiveType with originalType . |
public static Map<SchemaPath,ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns)
columnStatistics
with added statistics
for implicit and partition (dir) columns.columnsStatistics
- map of column statistics to expandcolumns
- list of all columns including implicit or partition onespartitionValues
- list of partition valuesoptionManager
- option managerlocation
- location of metadata partsupportsFileImplicitColumns
- whether implicit columns are supportedpublic static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)
RowGroupMetadata
received by converting parquet row groups metadata
taken from the specified tableMetadata.
Assigns index to row groups based on their position in files metadata.
For empty / fake row groups assigns '-1' index.tableMetadata
- the source of row groups to be convertedRowGroupMetadata
public static RowGroupMetadata getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location)
RowGroupMetadata
instance converted from specified parquet rowGroupMetadata
.tableMetadata
- table metadata which contains row group metadata to convertrowGroupMetadata
- row group metadata to convertrgIndexInFile
- index of current row group within the filelocation
- location of file with current row groupRowGroupMetadata
instance converted from specified parquet rowGroupMetadata
public static FileMetadata getFileMetadata(Collection<RowGroupMetadata> rowGroups)
FileMetadata
instance received by merging specified RowGroupMetadata
list.rowGroups
- collection of RowGroupMetadata
to be mergedFileMetadata
instancepublic static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files)
PartitionMetadata
instance received by merging specified FileMetadata
list.partitionColumn
- partition columnfiles
- list of files to be mergedPartitionMetadata
instancepublic static Map<SchemaPath,ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata)
MetadataBase.RowGroupMetadata
into the map of ColumnStatistics
instances with column names as keys.tableMetadata
- the source of column typesrowGroupMetadata
- metadata to convertpublic static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)
parquetTableMetadata
- the source of column metadata for non-interesting column's statisticspublic static Object getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType)
primitiveType
with originalType
.value
- value to handleprimitiveType
- primitive type of the column whose value should be handledoriginalType
- original type of the column whose value should be handledpublic static Map<SchemaPath,TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file)
file
.parquetTableMetadata
- the source of primitive and original column typesfile
- file whose columns should be discoveredpublic static Map<SchemaPath,TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
rowGroup
.parquetTableMetadata
- the source of primitive and original column typesrowGroup
- row group whose columns should be discoveredpublic static Map<SchemaPath,TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
NameSegment
in SchemaPath
in specified rowGroup
. The type for a SchemaPath
can be null
in case when
it is not possible to determine its type. Actually, as of now this hierarchy is of interest solely
because there is a need to account for TypeProtos.MinorType.DICT
to make sure filters used on DICT
's values (get by key) are not pruned out before actual filtering
happens.parquetTableMetadata
- the source of column typesrowGroup
- row group whose columns should be discoveredpublic static org.apache.parquet.schema.OriginalType getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
OriginalType
type for the specified column.parquetTableMetadata
- the source of column typecolumn
- column whose OriginalType
should be returnedOriginalType
type for the specified columnpublic static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
PrimitiveType.PrimitiveTypeName
type for the specified column.parquetTableMetadata
- the source of column typecolumn
- column whose PrimitiveType.PrimitiveTypeName
should be returnedPrimitiveType.PrimitiveTypeName
type for the specified columnpublic static Map<SchemaPath,ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics)
ColumnStatistics
obtained from specified DrillStatsTable
for all columns from specified BaseTableMetadata
.schema
- source of column namesstatistics
- source of column statisticsColumnStatistics
Copyright © 1970 The Apache Software Foundation. All rights reserved.