Class AbstractParquetGroupScan
java.lang.Object
org.apache.drill.exec.physical.base.AbstractBase
org.apache.drill.exec.physical.base.AbstractGroupScan
org.apache.drill.exec.physical.base.AbstractFileGroupScan
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata<ParquetMetadataProvider>
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan
- All Implemented Interfaces:
Iterable<PhysicalOperator>
,GraphValue<PhysicalOperator>
,FileGroupScan
,FragmentLeaf
,GroupScan
,HasAffinity
,Leaf
,PhysicalOperator
,Scan
- Direct Known Subclasses:
DeltaGroupScan
,HiveDrillNativeParquetScan
,ParquetGroupScan
public abstract class AbstractParquetGroupScan
extends AbstractGroupScanWithMetadata<ParquetMetadataProvider>
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
AbstractParquetGroupScan.RowGroupScanFilterer<B extends AbstractParquetGroupScan.RowGroupScanFilterer<B>>
This class is responsible for filtering different metadata levels including row group level.Nested classes/interfaces inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B extends AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B>>
-
Field Summary
Modifier and TypeFieldDescriptionprotected List<ReadEntryWithPath>
protected org.apache.drill.shaded.guava.com.google.common.collect.ListMultimap<Integer,
RowGroupInfo> protected ParquetReaderConfig
protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,
RowGroupMetadata> Fields inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
columns, files, fileSet, filter, limit, matchAllMetadata, metadataProvider, nonInterestingColumnsMetadata, partitionColumns, partitions, segments, tableMetadata, usedMetastore
Fields inherited from class org.apache.drill.exec.physical.base.AbstractBase
INIT_ALLOCATION, initialAllocation, MAX_ALLOCATION, maxAllocation, userName
Fields inherited from interface org.apache.drill.exec.physical.base.GroupScan
ALL_COLUMNS
-
Constructor Summary
ModifierConstructorDescriptionprotected
AbstractParquetGroupScan
(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter) protected
-
Method Summary
Modifier and TypeMethodDescriptionvoid
applyAssignments
(List<CoordinationProtos.DrillbitEndpoint> incomingEndpoints) applyFilter
(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExpr
to current group scan and produces filtering at: table level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null segment level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null if segment metadata was pruned, prunes underlying metadata partition level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null if partition metadata was pruned, prunes underlying metadata file level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns nullapplyLimit
(int maxRecords) By default, return null to indicate row count based prune is not supported.boolean
canPushdownProjects
(List<SchemaPath> columns) GroupScan should check the list of columns, and see if it could support all the columns in the list.protected abstract AbstractParquetGroupScan
cloneWithFileSelection
(Collection<org.apache.hadoop.fs.Path> filePaths) protected abstract ParquetMetadataProviderBuilder<?>
ReturnsTableMetadataProviderBuilder
instance which may provide metadata without using Drill Metastore.protected abstract Collection<CoordinationProtos.DrillbitEndpoint>
Collection<org.apache.hadoop.fs.Path>
getFiles()
This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled byentries
field.protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>>
Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.int
Calculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.protected List<RowGroupReadEntry>
getReadEntries
(int minorFragmentId) protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,
RowGroupMetadata> void
modifyFileSelection
(FileSelection selection) protected static <T extends BaseMetadata & LocationProvider>
org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> pruneForPartitions
(org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,
RowGroupMetadata> pruneRowGroupsForFiles
(Map<org.apache.hadoop.fs.Path, FileMetadata> filteredFileMetadata) boolean
Checks whether this group scan supports filter push down.Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
checkMetadataConsistency, getColumns, getColumnValueCount, getDigest, getFileSet, getFilesMetadata, getFilter, getFilterPredicate, getFilterPredicate, getFilterString, getLimit, getMetadataProvider, getNextOrEmpty, getNonInterestingColumnsMetadata, getPartitionColumns, getPartitionsMetadata, getPartitionValue, getPartitionValues, getScanStats, getSchema, getSegmentsMetadata, getTableMetadata, getTypeForColumn, hasFiles, init, isAllDataPruned, isGroupScanFullyMatchesFilter, isImplicitOrPartCol, isMatchAllMetadata, limitMetadata, pruneForPartitions, setFilter, setFilterForRuntime, supportsFileImplicitColumns, supportsLimitPushdown, tableMetadataProviderBuilder, usedMetastore
Methods inherited from class org.apache.drill.exec.physical.base.AbstractFileGroupScan
clone, supportsPartitionFilterPushdown
Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScan
accept, clone, enforceWidth, getAnalyzeInfoProvider, getDistributionAffinity, getInitialAllocation, getMaxAllocation, getMinParallelizationWidth, getOperatorType, getScanStats, getScanStats, getSelectionRoot, isDistributed, isExecutable, iterator
Methods inherited from class org.apache.drill.exec.physical.base.AbstractBase
accept, getCost, getOperatorId, getSVMode, getUserName, isBufferedOperator, setCost, setMaxAllocation, setOperatorId
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.drill.common.graph.GraphValue
accept
Methods inherited from interface org.apache.drill.exec.physical.base.GroupScan
clone, enforceWidth, getAnalyzeInfoProvider, getMinParallelizationWidth, getScanStats, getScanStats, getSelectionRoot, getSpecificScan, isDistributed
Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId
-
Field Details
-
entries
-
rowGroups
protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> rowGroups -
mappings
protected org.apache.drill.shaded.guava.com.google.common.collect.ListMultimap<Integer,RowGroupInfo> mappings -
readerConfig
-
-
Constructor Details
-
AbstractParquetGroupScan
protected AbstractParquetGroupScan(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter) -
AbstractParquetGroupScan
-
-
Method Details
-
getEntries
-
getReaderConfigForSerialization
-
getReaderConfig
-
getFiles
This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled byentries
field.- Specified by:
getFiles
in interfaceGroupScan
- Overrides:
getFiles
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
- Returns:
- collection of files paths
-
canPushdownProjects
Description copied from interface:GroupScan
GroupScan should check the list of columns, and see if it could support all the columns in the list.- Specified by:
canPushdownProjects
in interfaceGroupScan
- Overrides:
canPushdownProjects
in classAbstractGroupScan
-
supportsFilterPushDown
public boolean supportsFilterPushDown()Description copied from interface:GroupScan
Checks whether this group scan supports filter push down.- Specified by:
supportsFilterPushDown
in interfaceGroupScan
- Overrides:
supportsFilterPushDown
in classAbstractGroupScan
- Returns:
true
if this group scan supports filter push down,false
otherwise
-
getOperatorAffinity
Calculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.- Specified by:
getOperatorAffinity
in interfaceHasAffinity
- Overrides:
getOperatorAffinity
in classAbstractGroupScan
- Returns:
- a list of EndpointAffinity objects
-
applyAssignments
-
getMaxParallelizationWidth
public int getMaxParallelizationWidth() -
getReadEntries
-
applyFilter
public AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExpr
to current group scan and produces filtering at:- table level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- segment level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null - if segment metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- partition level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null - if partition metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- file level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- file metadata was pruned, prunes underlying metadata
- row group level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()
and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- Specified by:
applyFilter
in interfaceGroupScan
- Overrides:
applyFilter
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
- Parameters:
filterExpr
- filter expression to buildudfUtilities
- udf utilitiesfunctionImplementationRegistry
- context to find drill function holderoptionManager
- option manager- Returns:
- group scan with applied filter expression
- table level:
-
pruneRowGroupsForFiles
protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> pruneRowGroupsForFiles(Map<org.apache.hadoop.fs.Path, FileMetadata> filteredFileMetadata) -
applyLimit
Description copied from class:AbstractGroupScan
By default, return null to indicate row count based prune is not supported. Each group scan subclass should override, if it supports row count based prune.- Specified by:
applyLimit
in interfaceGroupScan
- Overrides:
applyLimit
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
- Parameters:
maxRecords
- : the number of rows requested from group scan.- Returns:
- a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
-
modifyFileSelection
- Specified by:
modifyFileSelection
in interfaceFileGroupScan
- Overrides:
modifyFileSelection
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
-
getRowGroupsMetadata
protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata() -
pruneForPartitions
protected static <T extends BaseMetadata & LocationProvider> org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> pruneForPartitions(org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.- Type Parameters:
T
- type of metadata to filter- Parameters:
metadataToPrune
- list of metadata which should be prunedfilteredPartitionMetadata
- list of partition metadata which was pruned- Returns:
- list with metadata which belongs to pruned partitions
-
getDrillbits
-
cloneWithFileSelection
protected abstract AbstractParquetGroupScan cloneWithFileSelection(Collection<org.apache.hadoop.fs.Path> filePaths) throws IOException - Throws:
IOException
-
defaultTableMetadataProviderBuilder
protected abstract ParquetMetadataProviderBuilder<?> defaultTableMetadataProviderBuilder(MetadataProviderManager source) Description copied from class:AbstractGroupScanWithMetadata
ReturnsTableMetadataProviderBuilder
instance which may provide metadata without using Drill Metastore.- Specified by:
defaultTableMetadataProviderBuilder
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
- Parameters:
source
- metadata provider manager- Returns:
TableMetadataProviderBuilder
instance
-
getFilterer
protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>> getFilterer()Description copied from class:AbstractGroupScanWithMetadata
Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.- Specified by:
getFilterer
in classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
-