Class AbstractGroupScanWithMetadata<P extends TableMetadataProvider>
java.lang.Object
org.apache.drill.exec.physical.base.AbstractBase
org.apache.drill.exec.physical.base.AbstractGroupScan
org.apache.drill.exec.physical.base.AbstractFileGroupScan
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata<P>
- All Implemented Interfaces:
Iterable<PhysicalOperator>
,GraphValue<PhysicalOperator>
,FileGroupScan
,FragmentLeaf
,GroupScan
,HasAffinity
,Leaf
,PhysicalOperator
,Scan
- Direct Known Subclasses:
AbstractParquetGroupScan
,EasyGroupScan
public abstract class AbstractGroupScanWithMetadata<P extends TableMetadataProvider>
extends AbstractFileGroupScan
Represents table group scan with metadata usage.
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B extends AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B>>
This class is responsible for filtering different metadata levels. -
Field Summary
Modifier and TypeFieldDescriptionprotected List<SchemaPath>
protected Map<org.apache.hadoop.fs.Path,
FileMetadata> protected Set<org.apache.hadoop.fs.Path>
protected LogicalExpression
protected int
protected boolean
protected P
protected NonInterestingColumnsMetadata
protected List<SchemaPath>
protected List<PartitionMetadata>
protected Map<org.apache.hadoop.fs.Path,
SegmentMetadata> protected TableMetadata
protected boolean
Fields inherited from class org.apache.drill.exec.physical.base.AbstractBase
INIT_ALLOCATION, initialAllocation, MAX_ALLOCATION, maxAllocation, userName
Fields inherited from interface org.apache.drill.exec.physical.base.GroupScan
ALL_COLUMNS
-
Constructor Summary
ModifierConstructorDescriptionprotected
AbstractGroupScanWithMetadata
(String userName, List<SchemaPath> columns, LogicalExpression filter) protected
-
Method Summary
Modifier and TypeMethodDescriptionapplyFilter
(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExpr
to current group scan and produces filtering at: table level: if filter matches all the the data or prunes all the data, sets corresponding value toisMatchAllMetadata()
and returns null segment level: if filter matches all the the data or prunes all the data, sets corresponding value toisMatchAllMetadata()
and returns null if segment metadata was pruned, prunes underlying metadata partition level: if filter matches all the the data or prunes all the data, sets corresponding value toisMatchAllMetadata()
and returns null if partition metadata was pruned, prunes underlying metadata file level: if filter matches all the the data or prunes all the data, sets corresponding value toisMatchAllMetadata()
and returns nullapplyLimit
(int maxRecords) By default, return null to indicate row count based prune is not supported.protected void
checkMetadataConsistency
(FileSelection selection, org.apache.hadoop.conf.Configuration fsConf) Compares the last modified time of files obtained from specified selection with the Metastore last modified time to determine whether Metastore metadata is up-to-date.protected abstract TableMetadataProviderBuilder
ReturnsTableMetadataProviderBuilder
instance which may provide metadata without using Drill Metastore.Returns a list of columns scanned by this group scanlong
getColumnValueCount
(SchemaPath column) Return column value count for the specified column.Returns a signature of theGroupScan
which should usually be composed of all its attributes which could describe it uniquely.Collection<org.apache.hadoop.fs.Path>
getFiles()
Returns a collection of file names associated with this GroupScan.Set<org.apache.hadoop.fs.Path>
Map<org.apache.hadoop.fs.Path,
FileMetadata> protected abstract AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?>
Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.getFilterPredicate
(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs) static FilterPredicate<?>
getFilterPredicate
(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs, boolean supportsFileImplicitColumns, TupleMetadata schema) Returns parquet filter predicate built from specifiedfilterExpr
.protected String
int
getLimit()
ReturnsTableMetadataProvider
instance which is used for providing metadata for currentGroupScan
.protected <T> List<T>
getNextOrEmpty
(Collection<T> inputList) Returns list with the first element of input list or empty list if input one was empty.Returns a list of columns that can be used for partition pruning<T> T
getPartitionValue
(org.apache.hadoop.fs.Path path, SchemaPath column, Class<T> clazz) getPartitionValues
(LocationProvider locationProvider) Map<org.apache.hadoop.fs.Path,
SegmentMetadata> getTypeForColumn
(SchemaPath schemaPath) boolean
hasFiles()
Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).protected void
init()
protected boolean
isAllDataPruned
(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) protected boolean
isGroupScanFullyMatchesFilter
(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) static boolean
isImplicitOrPartCol
(SchemaPath schemaPath, OptionManager optionManager) boolean
protected <T extends BaseMetadata>
List<T>limitMetadata
(Collection<T> metadataList, int maxRecords) Prunes specified metadata list and leaves minimum metadata instances count with general rows number which is not less than specifiedmaxRecords
.void
modifyFileSelection
(FileSelection selection) protected static <T extends BaseMetadata & LocationProvider>
Map<org.apache.hadoop.fs.Path,T> pruneForPartitions
(Map<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.void
setFilter
(LogicalExpression filter) void
setFilterForRuntime
(LogicalExpression filterExpr, OptimizerRulesContext optimizerContext) Set the filter - thus enabling runtime rowgroup pruning The runtime pruning can be disabled with an option.protected abstract boolean
boolean
Default is not to support limit pushdown.protected abstract TableMetadataProviderBuilder
ReturnsTableMetadataProviderBuilder
instance based on specifiedMetadataProviderManager
source.boolean
Returnstrue
if current group scan uses metadata obtained from the Metastore.Methods inherited from class org.apache.drill.exec.physical.base.AbstractFileGroupScan
clone, supportsPartitionFilterPushdown
Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScan
accept, canPushdownProjects, clone, enforceWidth, getAnalyzeInfoProvider, getDistributionAffinity, getInitialAllocation, getMaxAllocation, getMinParallelizationWidth, getOperatorAffinity, getOperatorType, getScanStats, getScanStats, getSelectionRoot, isDistributed, isExecutable, iterator, supportsFilterPushDown
Methods inherited from class org.apache.drill.exec.physical.base.AbstractBase
accept, getCost, getOperatorId, getSVMode, getUserName, isBufferedOperator, setCost, setMaxAllocation, setOperatorId
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.drill.common.graph.GraphValue
accept
Methods inherited from interface org.apache.drill.exec.physical.base.GroupScan
applyAssignments, canPushdownProjects, clone, enforceWidth, getAnalyzeInfoProvider, getMaxParallelizationWidth, getMinParallelizationWidth, getScanStats, getScanStats, getSelectionRoot, getSpecificScan, isDistributed, supportsFilterPushDown
Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity, getOperatorAffinity
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId
-
Field Details
-
metadataProvider
-
tableMetadata
-
partitions
-
segments
-
nonInterestingColumnsMetadata
-
partitionColumns
-
filter
-
columns
-
files
-
fileSet
-
matchAllMetadata
protected boolean matchAllMetadata -
usedMetastore
protected boolean usedMetastore -
limit
protected int limit
-
-
Constructor Details
-
AbstractGroupScanWithMetadata
protected AbstractGroupScanWithMetadata(String userName, List<SchemaPath> columns, LogicalExpression filter) -
AbstractGroupScanWithMetadata
-
-
Method Details
-
getColumns
Description copied from interface:GroupScan
Returns a list of columns scanned by this group scan- Specified by:
getColumns
in interfaceGroupScan
- Overrides:
getColumns
in classAbstractGroupScan
-
getFiles
Description copied from interface:GroupScan
Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.- Specified by:
getFiles
in interfaceGroupScan
- Overrides:
getFiles
in classAbstractGroupScan
- Returns:
- collection of files paths
-
hasFiles
public boolean hasFiles()Description copied from interface:GroupScan
Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).- Specified by:
hasFiles
in interfaceGroupScan
- Overrides:
hasFiles
in classAbstractGroupScan
-
getLimit
public int getLimit() -
isMatchAllMetadata
public boolean isMatchAllMetadata() -
getColumnValueCount
Return column value count for the specified column. If does not contain such column, return 0. Is used when applying convert to direct scan rule.- Specified by:
getColumnValueCount
in interfaceGroupScan
- Overrides:
getColumnValueCount
in classAbstractGroupScan
- Parameters:
column
- column schema path- Returns:
- column value count
-
getDigest
Description copied from interface:GroupScan
Returns a signature of theGroupScan
which should usually be composed of all its attributes which could describe it uniquely. -
getScanStats
- Overrides:
getScanStats
in classAbstractGroupScan
-
getFilter
- Specified by:
getFilter
in interfaceGroupScan
- Overrides:
getFilter
in classAbstractGroupScan
-
getMetadataProvider
Description copied from interface:GroupScan
ReturnsTableMetadataProvider
instance which is used for providing metadata for currentGroupScan
.- Specified by:
getMetadataProvider
in interfaceGroupScan
- Overrides:
getMetadataProvider
in classAbstractGroupScan
- Returns:
TableMetadataProvider
instance the source of metadata
-
setFilter
-
setFilterForRuntime
public void setFilterForRuntime(LogicalExpression filterExpr, OptimizerRulesContext optimizerContext) Set the filter - thus enabling runtime rowgroup pruning The runtime pruning can be disabled with an option.- Parameters:
filterExpr
- The filter to be used at runtime to match with rowgroups' footersoptimizerContext
- The context for the options
-
applyFilter
public AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExpr
to current group scan and produces filtering at:- table level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata()
and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- segment level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata()
and returns null - if segment metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- partition level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata()
and returns null - if partition metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- file level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
isMatchAllMetadata()
and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- Specified by:
applyFilter
in interfaceGroupScan
- Overrides:
applyFilter
in classAbstractGroupScan
- Parameters:
filterExpr
- filter expression to buildudfUtilities
- udf utilitiesfunctionImplementationRegistry
- context to find drill function holderoptionManager
- option manager- Returns:
- group scan with applied filter expression
- table level:
-
isAllDataPruned
protected boolean isAllDataPruned(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) -
isGroupScanFullyMatchesFilter
protected boolean isGroupScanFullyMatchesFilter(AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<?> filteredMetadata) -
getNextOrEmpty
Returns list with the first element of input list or empty list if input one was empty.- Type Parameters:
T
- type of values in the list- Parameters:
inputList
- the source of the first element- Returns:
- list with the first element of input list
-
getFilterer
Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata. -
getFilterPredicate
public FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs) -
getFilterPredicate
public static FilterPredicate<?> getFilterPredicate(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionLookupContext functionImplementationRegistry, OptionManager optionManager, boolean omitUnsupportedExprs, boolean supportsFileImplicitColumns, TupleMetadata schema) Returns parquet filter predicate built from specifiedfilterExpr
.- Parameters:
filterExpr
- filter expression to buildudfUtilities
- udf utilitiesfunctionImplementationRegistry
- context to find drill function holderoptionManager
- option manageromitUnsupportedExprs
- whether expressions which cannot be converted may be omitted from the resulting expressionsupportsFileImplicitColumns
- whether implicit columns are supportedschema
- schema- Returns:
- parquet filter predicate
-
getSchema
-
supportsLimitPushdown
public boolean supportsLimitPushdown()Description copied from class:AbstractGroupScan
Default is not to support limit pushdown.- Specified by:
supportsLimitPushdown
in interfaceGroupScan
- Overrides:
supportsLimitPushdown
in classAbstractGroupScan
-
applyLimit
Description copied from class:AbstractGroupScan
By default, return null to indicate row count based prune is not supported. Each group scan subclass should override, if it supports row count based prune.- Specified by:
applyLimit
in interfaceGroupScan
- Overrides:
applyLimit
in classAbstractGroupScan
- Parameters:
maxRecords
- : the number of rows requested from group scan.- Returns:
- a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
-
pruneForPartitions
protected static <T extends BaseMetadata & LocationProvider> Map<org.apache.hadoop.fs.Path,T> pruneForPartitions(Map<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.- Type Parameters:
T
- type of metadata to filter- Parameters:
metadataToPrune
- list of metadata which should be prunedfilteredPartitionMetadata
- list of partition metadata which was pruned- Returns:
- list with metadata which belongs to pruned partitions
-
limitMetadata
protected <T extends BaseMetadata> List<T> limitMetadata(Collection<T> metadataList, int maxRecords) Prunes specified metadata list and leaves minimum metadata instances count with general rows number which is not less than specifiedmaxRecords
.- Type Parameters:
T
- type of metadata to prune- Parameters:
metadataList
- list of metadata to prunemaxRecords
- rows number to leave- Returns:
- pruned metadata list
-
getPartitionColumns
Description copied from interface:GroupScan
Returns a list of columns that can be used for partition pruning- Specified by:
getPartitionColumns
in interfaceGroupScan
- Overrides:
getPartitionColumns
in classAbstractGroupScan
-
getTypeForColumn
-
getPartitionValue
-
getFileSet
-
modifyFileSelection
- Specified by:
modifyFileSelection
in interfaceFileGroupScan
- Overrides:
modifyFileSelection
in classAbstractFileGroupScan
-
init
- Throws:
IOException
-
getFilterString
-
supportsFileImplicitColumns
protected abstract boolean supportsFileImplicitColumns() -
getPartitionValues
-
isImplicitOrPartCol
-
getFilesMetadata
-
getTableMetadata
- Specified by:
getTableMetadata
in interfaceGroupScan
- Overrides:
getTableMetadata
in classAbstractGroupScan
-
getPartitionsMetadata
-
getSegmentsMetadata
-
usedMetastore
public boolean usedMetastore()Description copied from interface:GroupScan
Returnstrue
if current group scan uses metadata obtained from the Metastore.- Specified by:
usedMetastore
in interfaceGroupScan
- Overrides:
usedMetastore
in classAbstractGroupScan
- Returns:
true
if current group scan uses metadata obtained from the Metastore,false
otherwise.
-
getNonInterestingColumnsMetadata
-
tableMetadataProviderBuilder
protected abstract TableMetadataProviderBuilder tableMetadataProviderBuilder(MetadataProviderManager source) ReturnsTableMetadataProviderBuilder
instance based on specifiedMetadataProviderManager
source.- Parameters:
source
- metadata provider manager- Returns:
TableMetadataProviderBuilder
instance
-
defaultTableMetadataProviderBuilder
protected abstract TableMetadataProviderBuilder defaultTableMetadataProviderBuilder(MetadataProviderManager source) ReturnsTableMetadataProviderBuilder
instance which may provide metadata without using Drill Metastore.- Parameters:
source
- metadata provider manager- Returns:
TableMetadataProviderBuilder
instance
-
checkMetadataConsistency
protected void checkMetadataConsistency(FileSelection selection, org.apache.hadoop.conf.Configuration fsConf) throws IOException Compares the last modified time of files obtained from specified selection with the Metastore last modified time to determine whether Metastore metadata is up-to-date. If metadata is outdated,MetadataException
will be thrown.- Parameters:
selection
- the source of files to check- Throws:
MetadataException
- if metadata is outdatedIOException
-