Class BasicTablesRequests


public class BasicTablesRequests extends Object
Provides handy methods to retrieve Metastore Tables data for analysis. Contains list of most frequent requests to the Metastore Tables without a need to write filters and transformers from TableMetadataUnit class.
  • Constructor Details

    • BasicTablesRequests

      public BasicTablesRequests(Tables tables)
  • Method Details

    • metastoreTableInfo

      public MetastoreTableInfo metastoreTableInfo(TableInfo tableInfo)
      Returns metastore table information, including metastore version and table last modified time. Schematic SQL request:
         select lastModifiedTime from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'GENERAL_INFO'
         and metadataType = 'TABLE'
      tableInfo - table information
      MetastoreTableInfo instance
    • hasMetastoreTableInfoChanged

      public boolean hasMetastoreTableInfoChanged(MetastoreTableInfo metastoreTableInfo)
      Checks if given metastore table information is the same with current one. If Metastore supports versioning, first checks metastore versions, if metastore version did not change, it is assumed table metadata did not change as well. If Metastore version has changed or Metastore does not support versioning, retrieves current metastore table info and checks against given one.
      metastoreTableInfo - metastore table information
      true is metastore table information has changed, false otherwise
    • tablesMetadata

      public List<BaseTableMetadata> tablesMetadata(FilterExpression filter)
      Returns tables general information metadata based on given filter. For example, can return list of tables that belong to particular storage plugin or storage plugin and workspace combination. Schematic SQL request:
         select [$TABLE_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp'
         and metadataKey = 'GENERAL_INFO'
         and metadataType = 'TABLE'
      filter - filter expression
      list of table metadata
    • tableMetadata

      public BaseTableMetadata tableMetadata(TableInfo tableInfo)
      Returns table general information metadata based on given table information. Expects only one qualified result, otherwise will fail. If no data is returned, will return null. Schematic SQL request:
         select [$TABLE_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'GENERAL_INFO'
         and metadataType = 'TABLE'
      tableInfo - table information
      table metadata
    • segmentsMetadataByMetadataKey

      public List<SegmentMetadata> segmentsMetadataByMetadataKey(TableInfo tableInfo, List<String> locations, String metadataKey)
      Returns segments metadata based on given table information, locations and column name. Schematic SQL request:
         select [$SEGMENT_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and location in ('.../part_int=3/d3', '.../part_int=3/d4')
         and metadataKey = 'part_int=3'
         and metadataType = 'SEGMENT'
      tableInfo - table information
      locations - segments locations
      metadataKey - metadata key
      list of segment metadata
    • segmentsMetadataByColumn

      public List<SegmentMetadata> segmentsMetadataByColumn(TableInfo tableInfo, List<String> locations, String column)
      Returns segments metadata based on given table information, locations and column name. Schematic SQL request:
         select [$SEGMENT_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and location in ('.../dir0', '.../dir1')
         and column = 'n_nation'
         and metadataType = 'SEGMENT'
      tableInfo - table information
      locations - segments locations
      column - column name
      list of segment metadata
    • segmentsMetadata

      public List<SegmentMetadata> segmentsMetadata(TableInfo tableInfo, List<MetadataInfo> metadataInfos)
      Returns segments metadata based on the given table information and metadata identifiers. Schematic SQL request:
         select [$SEGMENT_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and identifier in ('part_int=3', …)
         and metadataType = 'SEGMENT'
      tableInfo - table information
      metadataInfos - list of MetadataInfo for required segments to obtain
      list of segment metadata
    • metadata

      public List<TableMetadataUnit> metadata(TableInfo tableInfo, Collection<MetadataInfo> metadataInfos)
      Returns list of TableMetadataUnit metadata based on the given table information, and metadata identifiers. Schematic SQL request:
         select * from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and identifier in ('part_int=3', …)
         and metadataType in ('SEGMENT', …)
      tableInfo - table information
      metadataInfos - list of MetadataInfo for required metadata to obtain
      list of metadata
    • partitionsMetadata

      public List<PartitionMetadata> partitionsMetadata(TableInfo tableInfo, List<String> metadataKeys, String column)
      Returns partitions metadata based on given table information, metadata keys and column name. Schematic SQL request:
         select [$PARTITION_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey in ('part_int=3', 'part_int=4')
         and column = 'n_nation'
         and metadataType = 'PARTITION'
      tableInfo - table information
      metadataKeys - list of metadata keys
      column - partition column
      list of partition metadata
    • filesMetadata

      public List<FileMetadata> filesMetadata(TableInfo tableInfo, String metadataKey, List<String> paths)
      Returns files metadata based on given table information, metadata key and files paths. Schematic SQL request:
         select [$FILE_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'part_int=3'
         and path in ('/tmp/nation/part_int=3/part_varchar=g/0_0_0.parquet', …)
         and metadataType = 'FILE'
      tableInfo - table information
      metadataKey - metadata key
      paths - list of full file paths
      list of files metadata
    • filesMetadata

      public List<FileMetadata> filesMetadata(TableInfo tableInfo, List<MetadataInfo> metadataInfos)
      Returns files metadata based on the given table information and metadata keys. Schematic SQL request:
         select [$FILE_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and identifier = in ('part_int=3', …)
         and metadataType = 'FILE'
      tableInfo - table information
      metadataInfos - list of MetadataInfo for required row groups to obtain
      list of row group metadata
    • fileMetadata

      public FileMetadata fileMetadata(TableInfo tableInfo, String metadataKey, String path)
      Returns file metadata based on given table information, metadata key and full path. Expects only one qualified result, otherwise will fail. If no data is returned, will return null. Schematic SQL request:
         select [$FILE_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'part_int=3'
         and path = '/tmp/nation/part_int=3/part_varchar=g/0_0_0.parquet'
         and metadataType = 'FILE'
      tableInfo - table information
      metadataKey - metadata key
      path - full file path
      list of files metadata
    • rowGroupsMetadata

      public List<RowGroupMetadata> rowGroupsMetadata(TableInfo tableInfo, String metadataKey, String path)
      Returns row groups metadata based on given table information, metadata key and location. Schematic SQL request:
         select [$ROW_GROUP_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'part_int=3'
         and path = '/tmp/nation/part_int=3/part_varchar=g/0_0_0.parquet'
         and metadataType = 'ROW_GROUP'
      tableInfo - table information
      metadataKey - metadata key
      path - full path to the file of the row group
      list of row group metadata
    • rowGroupsMetadata

      public List<RowGroupMetadata> rowGroupsMetadata(TableInfo tableInfo, List<String> metadataKeys, List<String> paths)
      Returns row groups metadata based on the given table information, metadata keys and locations. Schematic SQL request:
         select [$ROW_GROUP_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey in ('part_int=3', …)
         and path in ('/tmp/nation/part_int=3/part_varchar=g/0_0_0.parquet', …)
         and metadataType = 'ROW_GROUP'
      tableInfo - table information
      metadataKeys - metadata key
      paths - list of full paths to the file of the row group
      list of row group metadata
    • rowGroupsMetadata

      public List<RowGroupMetadata> rowGroupsMetadata(TableInfo tableInfo, List<MetadataInfo> metadataInfos)
      Returns row groups metadata based on given table information and metadata identifiers. Schematic SQL request:
         select [$ROW_GROUP_METADATA$] from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and identifier in ('part_int=3', …)
         and metadataType = 'ROW_GROUP'
      tableInfo - table information
      metadataInfos - list of MetadataInfo for required row groups to obtain
      list of row group metadata
    • fullSegmentsMetadataWithoutPartitions

      public BasicTablesTransformer.MetadataHolder fullSegmentsMetadataWithoutPartitions(TableInfo tableInfo, List<String> metadataKeys, List<String> locations)
      Returns metadata for segments, files and row groups based on given metadata keys and locations. Schematic SQL request:
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey in ('part_int=1', 'part_int=2', 'part_int=5')
         and location in ('.../dir0/d3', '.../dir0/d4', '.../part_int=3/d3', '.../part_int=4/d4')
         and metadataType in ('SEGMENT', 'FILE', 'ROW_GROUP')
      tableInfo - table information
      metadataKeys - metadata keys
      locations - locations
      list of segments / files / rows groups metadata in BasicTablesTransformer.MetadataHolder instance
    • filesLastModifiedTime

      public Map<String,Long> filesLastModifiedTime(TableInfo tableInfo, String metadataKey, List<String> locations)
      Returns map of file full paths and their last modified time. Schematic SQL request:
         select path, lastModifiedTime from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'part_int=3'
         and location in ('/tmp/nation/part_int=3/part_varchar=g', ...)
         and metadataType = 'FILE'
      tableInfo - table information
      metadataKey - metadata key
      locations - files locations
      result map where key is file full path and value is file last modification time
    • segmentsLastModifiedTime

      public Map<String,Long> segmentsLastModifiedTime(TableInfo tableInfo, List<String> locations)
      Returns map of segments metadata keys and their last modified time. Schematic SQL request:
         select metadataKey, lastModifiedTime from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and location in ('.../dir0', '.../dir1')
         and metadataType = 'SEGMENT'
      tableInfo - table information
      locations - segments locations
      result map where key is metadata key and value is its last modification time
    • interestingColumnsAndPartitionKeys

      public TableMetadataUnit interestingColumnsAndPartitionKeys(TableInfo tableInfo)
      Returns tables interesting columns and partition keys based on given table information. Expects only one qualified result, otherwise will fail. If no data is returned, will return null. Schematic SQL request:
         select interestingColumns, partitionKeys from METASTORE
         where storage = 'dfs' and workspace = 'tmp' and tableName = 'nation'
         and metadataKey = 'GENERAL_INFO'
         and metadataType = 'TABLE'
      tableInfo - table information
      TableMetadataUnit instance with set interesting columns and partition keys if present
    • request

      public List<TableMetadataUnit> request(BasicTablesRequests.RequestMetadata requestMetadata)
      Executes Metastore Tables read request based on given information in BasicTablesRequests.RequestMetadata.
      requestMetadata - request metadata
      list of metadata units