java.lang.Object

org.apache.drill.exec.store.parquet.ParquetTableMetadataUtils

public class ParquetTableMetadataUtils extends Object

Utility class for converting parquet metadata classes to Metastore metadata classes.

Method Summary

Modifier and Type

Method

Description

static Map<SchemaPath,ColumnStatistics<?>>

addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns)

Creates new map based on specified columnStatistics with added statistics for implicit and partition (dir) columns.

static Map<SchemaPath,ColumnStatistics<?>>

getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics)

Returns map with schema path and ColumnStatistics obtained from specified DrillStatsTable for all columns from specified BaseTableMetadata.

static Map<SchemaPath,TypeProtos.MajorType>

getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file)

Returns map of column names with their drill types for specified file.

static FileMetadata

getFileMetadata(Collection<RowGroupMetadata> rowGroups)

Returns FileMetadata instance received by merging specified RowGroupMetadata list.

static Map<SchemaPath,TypeProtos.MajorType>

getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)

Returns map of column names with their Drill types for every NameSegment in SchemaPath in specified rowGroup.

static NonInterestingColumnsMetadata

getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)

Returns the non-interesting column's metadata

static org.apache.parquet.schema.OriginalType

getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)

Returns OriginalType type for the specified column.

static PartitionMetadata

getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files)

Returns PartitionMetadata instance received by merging specified FileMetadata list.

static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName

getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)

Returns PrimitiveType.PrimitiveTypeName type for the specified column.

static Map<SchemaPath,ColumnStatistics<?>>

getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata)

Converts specified MetadataBase.RowGroupMetadata into the map of ColumnStatistics instances with column names as keys.

static Map<SchemaPath,TypeProtos.MajorType>

getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)

Returns map of column names with their drill types for specified rowGroup.

static RowGroupMetadata

getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location)

Returns RowGroupMetadata instance converted from specified parquet rowGroupMetadata.

static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata>

getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)

Returns list of RowGroupMetadata received by converting parquet row groups metadata taken from the specified tableMetadata.

static Object

getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType)

Handles passed value considering its type and specified primitiveType with originalType.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- addImplicitColumnsStatistics
  
  public static Map<SchemaPath,ColumnStatistics<?>> addImplicitColumnsStatistics(Map<SchemaPath,ColumnStatistics<?>> columnsStatistics, List<SchemaPath> columns, List<String> partitionValues, OptionManager optionManager, org.apache.hadoop.fs.Path location, boolean supportsFileImplicitColumns)
  
  Creates new map based on specified columnStatistics with added statistics for implicit and partition (dir) columns.
  
  Parameters:
  
  columnsStatistics - map of column statistics to expand
  
  columns - list of all columns including implicit or partition ones
  
  partitionValues - list of partition values
  
  optionManager - option manager
  
  location - location of metadata part
  
  supportsFileImplicitColumns - whether implicit columns are supported
  
  Returns:
  
  map with added statistics for implicit and partition (dir) columns
- getRowGroupsMetadata
  
  public static org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata)
  
  Returns list of RowGroupMetadata received by converting parquet row groups metadata taken from the specified tableMetadata. Assigns index to row groups based on their position in files metadata. For empty / fake row groups assigns '-1' index.
  
  Parameters:
  
  tableMetadata - the source of row groups to be converted
  
  Returns:
  
  list of RowGroupMetadata
- getRowGroupMetadata
  
  public static RowGroupMetadata getRowGroupMetadata(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata, int rgIndexInFile, org.apache.hadoop.fs.Path location)
  
  Returns RowGroupMetadata instance converted from specified parquet rowGroupMetadata.
  
  Parameters:
  
  tableMetadata - table metadata which contains row group metadata to convert
  
  rowGroupMetadata - row group metadata to convert
  
  rgIndexInFile - index of current row group within the file
  
  location - location of file with current row group
  
  Returns:
  
  RowGroupMetadata instance converted from specified parquet rowGroupMetadata
- getFileMetadata
  
  public static FileMetadata getFileMetadata(Collection<RowGroupMetadata> rowGroups)
  
  Returns FileMetadata instance received by merging specified RowGroupMetadata list.
  
  Parameters:
  
  rowGroups - collection of RowGroupMetadata to be merged
  
  Returns:
  
  FileMetadata instance
- getPartitionMetadata
  
  public static PartitionMetadata getPartitionMetadata(SchemaPath partitionColumn, List<FileMetadata> files)
  
  Returns PartitionMetadata instance received by merging specified FileMetadata list.
  
  Parameters:
  
  partitionColumn - partition column
  
  files - list of files to be merged
  
  Returns:
  
  PartitionMetadata instance
- getRowGroupColumnStatistics
  
  public static Map<SchemaPath,ColumnStatistics<?>> getRowGroupColumnStatistics(MetadataBase.ParquetTableMetadataBase tableMetadata, MetadataBase.RowGroupMetadata rowGroupMetadata)
  
  Converts specified MetadataBase.RowGroupMetadata into the map of ColumnStatistics instances with column names as keys.
  
  Parameters:
  
  tableMetadata - the source of column types
  
  rowGroupMetadata - metadata to convert
  
  Returns:
  
  map with converted row group metadata
- getNonInterestingColumnsMeta
  
  public static NonInterestingColumnsMetadata getNonInterestingColumnsMeta(MetadataBase.ParquetTableMetadataBase parquetTableMetadata)
  
  Returns the non-interesting column's metadata
  
  Parameters:
  
  parquetTableMetadata - the source of column metadata for non-interesting column's statistics
  
  Returns:
  
  returns non-interesting columns metadata
- getValue
  
  public static Object getValue(Object value, org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName primitiveType, org.apache.parquet.schema.OriginalType originalType)
  
  Handles passed value considering its type and specified primitiveType with originalType.
  
  Parameters:
  
  value - value to handle
  
  primitiveType - primitive type of the column whose value should be handled
  
  originalType - original type of the column whose value should be handled
  
  Returns:
  
  handled value
- getFileFields
  
  public static Map<SchemaPath,TypeProtos.MajorType> getFileFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ParquetFileMetadata file)
  
  Returns map of column names with their drill types for specified file.
  
  Parameters:
  
  parquetTableMetadata - the source of primitive and original column types
  
  file - file whose columns should be discovered
  
  Returns:
  
  map of column names with their drill types
- getRowGroupFields
  
  public static Map<SchemaPath,TypeProtos.MajorType> getRowGroupFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
  
  Returns map of column names with their drill types for specified rowGroup.
  
  Parameters:
  
  parquetTableMetadata - the source of primitive and original column types
  
  rowGroup - row group whose columns should be discovered
  
  Returns:
  
  map of column names with their drill types
- getIntermediateFields
  
  public static Map<SchemaPath,TypeProtos.MajorType> getIntermediateFields(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.RowGroupMetadata rowGroup)
  
  Returns map of column names with their Drill types for every NameSegment in SchemaPath in specified rowGroup. The type for a SchemaPath can be null in case when it is not possible to determine its type. Actually, as of now this hierarchy is of interest solely because there is a need to account for TypeProtos.MinorType.DICT to make sure filters used on DICT's values (get by key) are not pruned out before actual filtering happens.
  
  Parameters:
  
  parquetTableMetadata - the source of column types
  
  rowGroup - row group whose columns should be discovered
  
  Returns:
  
  map of column names with their drill types
- getOriginalType
  
  public static org.apache.parquet.schema.OriginalType getOriginalType(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
  
  Returns OriginalType type for the specified column.
  
  Parameters:
  
  parquetTableMetadata - the source of column type
  
  column - column whose OriginalType should be returned
  
  Returns:
  
  OriginalType type for the specified column
- getPrimitiveTypeName
  
  public static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName getPrimitiveTypeName(MetadataBase.ParquetTableMetadataBase parquetTableMetadata, MetadataBase.ColumnMetadata column)
  
  Returns PrimitiveType.PrimitiveTypeName type for the specified column.
  
  Parameters:
  
  parquetTableMetadata - the source of column type
  
  column - column whose PrimitiveType.PrimitiveTypeName should be returned
  
  Returns:
  
  PrimitiveType.PrimitiveTypeName type for the specified column
- getColumnStatistics
  
  public static Map<SchemaPath,ColumnStatistics<?>> getColumnStatistics(TupleMetadata schema, DrillStatsTable statistics)
  
  Returns map with schema path and ColumnStatistics obtained from specified DrillStatsTable for all columns from specified BaseTableMetadata.
  
  Parameters:
  
  schema - source of column names
  
  statistics - source of column statistics
  
  Returns:
  
  map with schema path and ColumnStatistics

Class ParquetTableMetadataUtils

Method Summary

Methods inherited from class java.lang.Object

Method Details

addImplicitColumnsStatistics

getRowGroupsMetadata

getRowGroupMetadata

getFileMetadata

getPartitionMetadata

getRowGroupColumnStatistics

getNonInterestingColumnsMeta

getValue

getFileFields

getRowGroupFields

getIntermediateFields

getOriginalType

getPrimitiveTypeName

getColumnStatistics