org.apache.drill.exec.store.parquet.metadata.Metadata

public class Metadata extends Object

This is an utility class, holder for Parquet Table Metadata and ParquetReaderConfig. All the creation of parquet metadata cache using create api's are forced to happen using the process user since only that user will have write permission for the cache file

Field Summary

Fields

Modifier and Type

Field

Description

static final String[]

CURRENT_METADATA_FILENAMES

static final Long

DEFAULT_NULL_COUNT

static final String

METADATA_DIRECTORIES_FILENAME

static final String

METADATA_FILENAME

static final String

METADATA_SUMMARY_FILENAME

static final Long

NULL_COUNT_NOT_EXISTS

static final String

OLD_METADATA_FILENAME

static final String[]

OLD_METADATA_FILENAMES
Method Summary

Modifier and Type

Method

Description

static void

createMeta(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig, boolean allColumnsInteresting, Set<SchemaPath> columnSet)

Create the parquet metadata file for the directory at the given path, and for any subdirectories.

static org.apache.hadoop.fs.Path

getDirFileName(org.apache.hadoop.fs.Path metadataParentDir)

static Metadata_V4.ParquetFileAndRowCountMetadata

getParquetFileMetadata_v4(Metadata_V4.ParquetTableMetadata_v4 parquetTableMetadata, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, org.apache.hadoop.fs.FileStatus file, org.apache.hadoop.fs.FileSystem fs, boolean allColumnsInteresting, boolean skipNonInteresting, Set<SchemaPath> columnSet, ParquetReaderConfig readerConfig)

Get the file metadata for a single file

static Metadata_V4.ParquetTableMetadata_v4

getParquetTableMetadata(Map<org.apache.hadoop.fs.FileStatus,org.apache.hadoop.fs.FileSystem> fileStatusMap, ParquetReaderConfig readerConfig)

Get the parquet metadata for a list of parquet files.

static Metadata_V4.ParquetTableMetadata_v4

getParquetTableMetadata(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig)

Get the parquet metadata for the parquet files in the given directory, including those in subdirectories.

static Metadata_V4.MetadataSummary

getSummary(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path metadataParentDir, boolean autoRefreshTriggered, ParquetReaderConfig readerConfig)

Reads the summary from the metadata cache file, if the cache file is stale recreates the metadata

static org.apache.hadoop.fs.Path

getSummaryFileName(org.apache.hadoop.fs.Path metadataParentDir)

static MetadataBase.ParquetTableMetadataBase

readBlockMeta(org.apache.hadoop.fs.FileSystem fs, List<org.apache.hadoop.fs.Path> paths, MetadataContext metaContext, ParquetReaderConfig readerConfig)

Get the parquet metadata for the table by reading the metadata file

static ParquetTableMetadataDirs

readMetadataDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, MetadataContext metaContext, ParquetReaderConfig readerConfig)

Get the parquet metadata for all subdirectories by reading the metadata file

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- OLD_METADATA_FILENAMES
  
  public static final String[] OLD_METADATA_FILENAMES
- OLD_METADATA_FILENAME
  
  public static final String OLD_METADATA_FILENAME
  See Also:
  
  Constant Field Values
- METADATA_DIRECTORIES_FILENAME
  
  public static final String METADATA_DIRECTORIES_FILENAME
  See Also:
  
  Constant Field Values
- METADATA_FILENAME
  
  public static final String METADATA_FILENAME
  See Also:
  
  Constant Field Values
- METADATA_SUMMARY_FILENAME
  
  public static final String METADATA_SUMMARY_FILENAME
  See Also:
  
  Constant Field Values
- CURRENT_METADATA_FILENAMES
  
  public static final String[] CURRENT_METADATA_FILENAMES
- DEFAULT_NULL_COUNT
  
  public static final Long DEFAULT_NULL_COUNT
- NULL_COUNT_NOT_EXISTS
  
  public static final Long NULL_COUNT_NOT_EXISTS
Method Details
- createMeta
  
  public static void createMeta(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig, boolean allColumnsInteresting, Set<SchemaPath> columnSet) throws IOException
  
  Create the parquet metadata file for the directory at the given path, and for any subdirectories.
  
  Parameters:
  
  fs - file system
  
  path - path
  
  readerConfig - parquet reader configuration
  
  allColumnsInteresting - if set, store column metadata for all the columns
  
  columnSet - Set of columns for which column metadata has to be stored
  
  Throws:
  
  IOException
- getParquetTableMetadata
  
  public static Metadata_V4.ParquetTableMetadata_v4 getParquetTableMetadata(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig) throws IOException
  
  Get the parquet metadata for the parquet files in the given directory, including those in subdirectories.
  
  Parameters:
  
  fs - file system
  
  path - path
  
  readerConfig - parquet reader configuration
  
  Returns:
  
  parquet table metadata
  
  Throws:
  
  IOException
- getParquetTableMetadata
  
  public static Metadata_V4.ParquetTableMetadata_v4 getParquetTableMetadata(Map<org.apache.hadoop.fs.FileStatus,org.apache.hadoop.fs.FileSystem> fileStatusMap, ParquetReaderConfig readerConfig) throws IOException
  
  Get the parquet metadata for a list of parquet files.
  
  Parameters:
  
  fileStatusMap - file statuses and corresponding file systems
  
  readerConfig - parquet reader configuration
  
  Returns:
  
  parquet table metadata
  
  Throws:
  
  IOException
- readBlockMeta
  
  public static MetadataBase.ParquetTableMetadataBase readBlockMeta(org.apache.hadoop.fs.FileSystem fs, List<org.apache.hadoop.fs.Path> paths, MetadataContext metaContext, ParquetReaderConfig readerConfig)
  
  Get the parquet metadata for the table by reading the metadata file
  
  Parameters:
  
  fs - current file system
  
  paths - The path to the metadata file, located in the directory that contains the parquet files
  
  metaContext - metadata context
  
  readerConfig - parquet reader configuration
  
  Returns:
  
  parquet table metadata. Null if metadata cache is missing, unsupported or corrupted
- readMetadataDirs
  
  public static ParquetTableMetadataDirs readMetadataDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, MetadataContext metaContext, ParquetReaderConfig readerConfig)
  
  Get the parquet metadata for all subdirectories by reading the metadata file
  
  Parameters:
  
  fs - current file system
  
  path - The path to the metadata file, located in the directory that contains the parquet files
  
  metaContext - metadata context
  
  readerConfig - parquet reader configuration
  
  Returns:
  
  parquet metadata for a directory. Null if metadata cache is missing, unsupported or corrupted
- getParquetFileMetadata_v4
  
  public static Metadata_V4.ParquetFileAndRowCountMetadata getParquetFileMetadata_v4(Metadata_V4.ParquetTableMetadata_v4 parquetTableMetadata, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, org.apache.hadoop.fs.FileStatus file, org.apache.hadoop.fs.FileSystem fs, boolean allColumnsInteresting, boolean skipNonInteresting, Set<SchemaPath> columnSet, ParquetReaderConfig readerConfig) throws IOException, InterruptedException
  
  Get the file metadata for a single file
  
  Parameters:
  
  parquetTableMetadata - The table metadata to be updated with all the columns' info
  
  footer - If non null, use this footer instead of reading it from the file
  
  file - The file
  
  allColumnsInteresting - If true, read the min/max metadata for all the columns
  
  skipNonInteresting - If true, collect info only for the interesting columns
  
  columnSet - Specifies specific columns for which min/max metadata is collected
  
  readerConfig - for the options
  
  Returns:
  
  the file metadata
  
  Throws:
  
  IOException
  
  InterruptedException
- getSummaryFileName
  
  public static org.apache.hadoop.fs.Path getSummaryFileName(org.apache.hadoop.fs.Path metadataParentDir)
- getDirFileName
  
  public static org.apache.hadoop.fs.Path getDirFileName(org.apache.hadoop.fs.Path metadataParentDir)
- getSummary
  
  public static Metadata_V4.MetadataSummary getSummary(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path metadataParentDir, boolean autoRefreshTriggered, ParquetReaderConfig readerConfig)
  
  Reads the summary from the metadata cache file, if the cache file is stale recreates the metadata
  
  Parameters:
  
  fs - file system
  
  metadataParentDir - parent directory that holds metadata files
  
  autoRefreshTriggered - true if the auto-refresh is already triggered
  
  readerConfig - Parquet reader config
  
  Returns:
  
  returns metadata summary

Class Metadata

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

OLD_METADATA_FILENAMES

OLD_METADATA_FILENAME

METADATA_DIRECTORIES_FILENAME

METADATA_FILENAME

METADATA_SUMMARY_FILENAME

CURRENT_METADATA_FILENAMES

DEFAULT_NULL_COUNT

NULL_COUNT_NOT_EXISTS

Method Details

createMeta

getParquetTableMetadata

getParquetTableMetadata

readBlockMeta

readMetadataDirs

getParquetFileMetadata_v4

getSummaryFileName

getDirFileName

getSummary