java.lang.Object
org.apache.drill.exec.store.parquet.metadata.Metadata

public class Metadata extends Object
This is an utility class, holder for Parquet Table Metadata and ParquetReaderConfig. All the creation of parquet metadata cache using create api's are forced to happen using the process user since only that user will have write permission for the cache file
  • Field Details

    • OLD_METADATA_FILENAMES

      public static final String[] OLD_METADATA_FILENAMES
    • OLD_METADATA_FILENAME

      public static final String OLD_METADATA_FILENAME
      See Also:
    • METADATA_DIRECTORIES_FILENAME

      public static final String METADATA_DIRECTORIES_FILENAME
      See Also:
    • METADATA_FILENAME

      public static final String METADATA_FILENAME
      See Also:
    • METADATA_SUMMARY_FILENAME

      public static final String METADATA_SUMMARY_FILENAME
      See Also:
    • CURRENT_METADATA_FILENAMES

      public static final String[] CURRENT_METADATA_FILENAMES
    • DEFAULT_NULL_COUNT

      public static final Long DEFAULT_NULL_COUNT
    • NULL_COUNT_NOT_EXISTS

      public static final Long NULL_COUNT_NOT_EXISTS
  • Method Details

    • createMeta

      public static void createMeta(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig, boolean allColumnsInteresting, Set<SchemaPath> columnSet) throws IOException
      Create the parquet metadata file for the directory at the given path, and for any subdirectories.
      Parameters:
      fs - file system
      path - path
      readerConfig - parquet reader configuration
      allColumnsInteresting - if set, store column metadata for all the columns
      columnSet - Set of columns for which column metadata has to be stored
      Throws:
      IOException
    • getParquetTableMetadata

      public static Metadata_V4.ParquetTableMetadata_v4 getParquetTableMetadata(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, ParquetReaderConfig readerConfig) throws IOException
      Get the parquet metadata for the parquet files in the given directory, including those in subdirectories.
      Parameters:
      fs - file system
      path - path
      readerConfig - parquet reader configuration
      Returns:
      parquet table metadata
      Throws:
      IOException
    • getParquetTableMetadata

      public static Metadata_V4.ParquetTableMetadata_v4 getParquetTableMetadata(Map<org.apache.hadoop.fs.FileStatus,org.apache.hadoop.fs.FileSystem> fileStatusMap, ParquetReaderConfig readerConfig) throws IOException
      Get the parquet metadata for a list of parquet files.
      Parameters:
      fileStatusMap - file statuses and corresponding file systems
      readerConfig - parquet reader configuration
      Returns:
      parquet table metadata
      Throws:
      IOException
    • readBlockMeta

      public static MetadataBase.ParquetTableMetadataBase readBlockMeta(org.apache.hadoop.fs.FileSystem fs, List<org.apache.hadoop.fs.Path> paths, MetadataContext metaContext, ParquetReaderConfig readerConfig)
      Get the parquet metadata for the table by reading the metadata file
      Parameters:
      fs - current file system
      paths - The path to the metadata file, located in the directory that contains the parquet files
      metaContext - metadata context
      readerConfig - parquet reader configuration
      Returns:
      parquet table metadata. Null if metadata cache is missing, unsupported or corrupted
    • readMetadataDirs

      public static ParquetTableMetadataDirs readMetadataDirs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, MetadataContext metaContext, ParquetReaderConfig readerConfig)
      Get the parquet metadata for all subdirectories by reading the metadata file
      Parameters:
      fs - current file system
      path - The path to the metadata file, located in the directory that contains the parquet files
      metaContext - metadata context
      readerConfig - parquet reader configuration
      Returns:
      parquet metadata for a directory. Null if metadata cache is missing, unsupported or corrupted
    • getParquetFileMetadata_v4

      public static Metadata_V4.ParquetFileAndRowCountMetadata getParquetFileMetadata_v4(Metadata_V4.ParquetTableMetadata_v4 parquetTableMetadata, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, org.apache.hadoop.fs.FileStatus file, org.apache.hadoop.fs.FileSystem fs, boolean allColumnsInteresting, boolean skipNonInteresting, Set<SchemaPath> columnSet, ParquetReaderConfig readerConfig) throws IOException, InterruptedException
      Get the file metadata for a single file
      Parameters:
      parquetTableMetadata - The table metadata to be updated with all the columns' info
      footer - If non null, use this footer instead of reading it from the file
      file - The file
      allColumnsInteresting - If true, read the min/max metadata for all the columns
      skipNonInteresting - If true, collect info only for the interesting columns
      columnSet - Specifies specific columns for which min/max metadata is collected
      readerConfig - for the options
      Returns:
      the file metadata
      Throws:
      IOException
      InterruptedException
    • getSummaryFileName

      public static org.apache.hadoop.fs.Path getSummaryFileName(org.apache.hadoop.fs.Path metadataParentDir)
    • getDirFileName

      public static org.apache.hadoop.fs.Path getDirFileName(org.apache.hadoop.fs.Path metadataParentDir)
    • getSummary

      public static Metadata_V4.MetadataSummary getSummary(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path metadataParentDir, boolean autoRefreshTriggered, ParquetReaderConfig readerConfig)
      Reads the summary from the metadata cache file, if the cache file is stale recreates the metadata
      Parameters:
      fs - file system
      metadataParentDir - parent directory that holds metadata files
      autoRefreshTriggered - true if the auto-refresh is already triggered
      readerConfig - Parquet reader config
      Returns:
      returns metadata summary