Class AbstractGroupScanWithMetadata<P extends TableMetadataProvider>

All Implemented Interfaces:
Iterable<PhysicalOperator>, GraphValue<PhysicalOperator>, FileGroupScan, FragmentLeaf, GroupScan, HasAffinity, Leaf, PhysicalOperator, Scan
Direct Known Subclasses:
AbstractParquetGroupScan, EasyGroupScan

public abstract class AbstractGroupScanWithMetadata<P extends TableMetadataProvider> extends AbstractFileGroupScan
Represents table group scan with metadata usage.
  • Field Details

  • Constructor Details

  • Method Details

    • getColumns

      public List<SchemaPath> getColumns()
      Description copied from interface: GroupScan
      Returns a list of columns scanned by this group scan
      Specified by:
      getColumns in interface GroupScan
      Overrides:
      getColumns in class AbstractGroupScan
    • getFiles

      public Collection<org.apache.hadoop.fs.Path> getFiles()
      Description copied from interface: GroupScan
      Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.
      Specified by:
      getFiles in interface GroupScan
      Overrides:
      getFiles in class AbstractGroupScan
      Returns:
      collection of files paths
    • hasFiles

      public boolean hasFiles()
      Description copied from interface: GroupScan
      Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).
      Specified by:
      hasFiles in interface GroupScan
      Overrides:
      hasFiles in class AbstractGroupScan
    • getLimit

      public int getLimit()
    • isMatchAllMetadata

      public boolean isMatchAllMetadata()
    • getColumnValueCount

      public long getColumnValueCount(SchemaPath column)
      Return column value count for the specified column. If does not contain such column, return 0. Is used when applying convert to direct scan rule.
      Specified by:
      getColumnValueCount in interface GroupScan
      Overrides:
      getColumnValueCount in class AbstractGroupScan
      Parameters:
      column - column schema path
      Returns:
      column value count
    • getDigest

      public String getDigest()
      Description copied from interface: GroupScan
      Returns a signature of the GroupScan which should usually be composed of all its attributes which could describe it uniquely.
    • getScanStats

      public ScanStats getScanStats()
      Overrides:
      getScanStats in class AbstractGroupScan
    • getFilter

      public LogicalExpression getFilter()
      Specified by:
      getFilter in interface GroupScan
      Overrides:
      getFilter in class AbstractGroupScan
    • getMetadataProvider

      public P getMetadataProvider()
      Description copied from interface: GroupScan
      Returns TableMetadataProvider instance which is used for providing metadata for current GroupScan.
      Specified by:
      getMetadataProvider in interface GroupScan
      Overrides:
      getMetadataProvider in class AbstractGroupScan
      Returns:
      TableMetadataProvider instance the source of metadata
    • setFilter

      public void setFilter(LogicalExpression filter)
    • setFilterForRuntime

      public void setFilterForRuntime(LogicalExpression filterExpr, OptimizerRulesContext optimizerContext)
      Set the filter - thus enabling runtime rowgroup pruning The runtime pruning can be disabled with an option.
      Parameters:
      filterExpr - The filter to be used at runtime to match with rowgroups' footers
      optimizerContext - The context for the options
    • applyFilter

      public AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
      Applies specified filter filterExpr to current group scan and produces filtering at:
      • table level:
        • if filter matches all the the data or prunes all the data, sets corresponding value to isMatchAllMetadata() and returns null
      • segment level:
        • if filter matches all the the data or prunes all the data, sets corresponding value to isMatchAllMetadata() and returns null
        • if segment metadata was pruned, prunes underlying metadata
      • partition level:
        • if filter matches all the the data or prunes all the data, sets corresponding value to isMatchAllMetadata() and returns null
        • if partition metadata was pruned, prunes underlying metadata
      • file level:
        • if filter matches all the the data or prunes all the data, sets corresponding value to isMatchAllMetadata() and returns null
      Specified by:
      applyFilter in interface GroupScan
      Overrides:
      applyFilter in class AbstractGroupScan
      Parameters:
      filterExpr - filter expression to build
      udfUtilities - udf utilities
      functionImplementationRegistry - context to find drill function holder
      optionManager - option manager
      Returns:
      group scan with applied filter expression
    • isAllDataPruned

      protected boolean isAllDataPruned(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata)
    • isGroupScanFullyMatchesFilter

      protected boolean isGroupScanFullyMatchesFilter(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata)
    • getNextOrEmpty

      protected <T> List<T> getNextOrEmpty(Collection<T> inputList)
      Returns list with the first element of input list or empty list if input one was empty.
      Type Parameters:
      T - type of values in the list
      Parameters:
      inputList - the source of the first element
      Returns:
      list with the first element of input list
    • getFilterer

      Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.
    • getFilterPredicate

      public FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs)
    • getFilterPredicate

      public static FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs, boolean supportsFileImplicitColumns, TupleMetadata schema)
      Returns parquet filter predicate built from specified filterExpr.
      Parameters:
      filterExpr - filter expression to build
      udfUtilities - udf utilities
      functionImplementationRegistry - context to find drill function holder
      optionManager - option manager
      omitUnsupportedExprs - whether expressions which cannot be converted may be omitted from the resulting expression
      supportsFileImplicitColumns - whether implicit columns are supported
      schema - schema
      Returns:
      parquet filter predicate
    • getSchema

      public TupleMetadata getSchema()
    • supportsLimitPushdown

      public boolean supportsLimitPushdown()
      Description copied from class: AbstractGroupScan
      Default is not to support limit pushdown.
      Specified by:
      supportsLimitPushdown in interface GroupScan
      Overrides:
      supportsLimitPushdown in class AbstractGroupScan
    • applyLimit

      public GroupScan applyLimit(int maxRecords)
      Description copied from class: AbstractGroupScan
      By default, return null to indicate row count based prune is not supported. Each group scan subclass should override, if it supports row count based prune.
      Specified by:
      applyLimit in interface GroupScan
      Overrides:
      applyLimit in class AbstractGroupScan
      Parameters:
      maxRecords - : the number of rows requested from group scan.
      Returns:
      a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
    • pruneForPartitions

      protected static <T extends BaseMetadata & LocationProvider> Map<org.apache.hadoop.fs.Path,T> pruneForPartitions(Map<org.apache.hadoop.fs.Path,T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata)
      Removes metadata which does not belong to any of partitions in metadata list.
      Type Parameters:
      T - type of metadata to filter
      Parameters:
      metadataToPrune - list of metadata which should be pruned
      filteredPartitionMetadata - list of partition metadata which was pruned
      Returns:
      list with metadata which belongs to pruned partitions
    • limitMetadata

      protected <T extends BaseMetadata> List<T> limitMetadata(Collection<T> metadataList, int maxRecords)
      Prunes specified metadata list and leaves minimum metadata instances count with general rows number which is not less than specified maxRecords.
      Type Parameters:
      T - type of metadata to prune
      Parameters:
      metadataList - list of metadata to prune
      maxRecords - rows number to leave
      Returns:
      pruned metadata list
    • getPartitionColumns

      public List<SchemaPath> getPartitionColumns()
      Description copied from interface: GroupScan
      Returns a list of columns that can be used for partition pruning
      Specified by:
      getPartitionColumns in interface GroupScan
      Overrides:
      getPartitionColumns in class AbstractGroupScan
    • getTypeForColumn

      public TypeProtos.MajorType getTypeForColumn(SchemaPath schemaPath)
    • getPartitionValue

      public <T> T getPartitionValue(org.apache.hadoop.fs.Path path, SchemaPath column, Class<T> clazz)
    • getFileSet

      public Set<org.apache.hadoop.fs.Path> getFileSet()
    • modifyFileSelection

      public void modifyFileSelection(FileSelection selection)
      Specified by:
      modifyFileSelection in interface FileGroupScan
      Overrides:
      modifyFileSelection in class AbstractFileGroupScan
    • init

      protected void init() throws IOException
      Throws:
      IOException
    • getFilterString

      protected String getFilterString()
    • supportsFileImplicitColumns

      protected abstract boolean supportsFileImplicitColumns()
    • getPartitionValues

      protected abstract List<String> getPartitionValues(LocationProvider locationProvider)
    • isImplicitOrPartCol

      public static boolean isImplicitOrPartCol(SchemaPath schemaPath, OptionManager optionManager)
    • getFilesMetadata

      public Map<org.apache.hadoop.fs.Path,FileMetadata> getFilesMetadata()
    • getTableMetadata

      public TableMetadata getTableMetadata()
      Specified by:
      getTableMetadata in interface GroupScan
      Overrides:
      getTableMetadata in class AbstractGroupScan
    • getPartitionsMetadata

      public List<PartitionMetadata> getPartitionsMetadata()
    • getSegmentsMetadata

      public Map<org.apache.hadoop.fs.Path,SegmentMetadata> getSegmentsMetadata()
    • usedMetastore

      public boolean usedMetastore()
      Description copied from interface: GroupScan
      Returns true if current group scan uses metadata obtained from the Metastore.
      Specified by:
      usedMetastore in interface GroupScan
      Overrides:
      usedMetastore in class AbstractGroupScan
      Returns:
      true if current group scan uses metadata obtained from the Metastore, false otherwise.
    • getNonInterestingColumnsMetadata

      public NonInterestingColumnsMetadata getNonInterestingColumnsMetadata()
    • tableMetadataProviderBuilder

      protected abstract TableMetadataProviderBuilder tableMetadataProviderBuilder(MetadataProviderManager source)
      Returns TableMetadataProviderBuilder instance based on specified MetadataProviderManager source.
      Parameters:
      source - metadata provider manager
      Returns:
      TableMetadataProviderBuilder instance
    • defaultTableMetadataProviderBuilder

      protected abstract TableMetadataProviderBuilder defaultTableMetadataProviderBuilder(MetadataProviderManager source)
      Returns TableMetadataProviderBuilder instance which may provide metadata without using Drill Metastore.
      Parameters:
      source - metadata provider manager
      Returns:
      TableMetadataProviderBuilder instance
    • checkMetadataConsistency

      protected void checkMetadataConsistency(FileSelection selection, org.apache.hadoop.conf.Configuration fsConf) throws IOException
      Compares the last modified time of files obtained from specified selection with the Metastore last modified time to determine whether Metastore metadata is up-to-date. If metadata is outdated, MetadataException will be thrown.
      Parameters:
      selection - the source of files to check
      Throws:
      MetadataException - if metadata is outdated
      IOException