Class AbstractParquetGroupScan
java.lang.Object
org.apache.drill.exec.physical.base.AbstractBase
org.apache.drill.exec.physical.base.AbstractGroupScan
org.apache.drill.exec.physical.base.AbstractFileGroupScan
org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata<ParquetMetadataProvider>
org.apache.drill.exec.store.parquet.AbstractParquetGroupScan
- All Implemented Interfaces:
Iterable<PhysicalOperator>,GraphValue<PhysicalOperator>,FileGroupScan,FragmentLeaf,GroupScan,HasAffinity,Leaf,PhysicalOperator,Scan
- Direct Known Subclasses:
DeltaGroupScan,HiveDrillNativeParquetScan,ParquetGroupScan
public abstract class AbstractParquetGroupScan
extends AbstractGroupScanWithMetadata<ParquetMetadataProvider>
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classAbstractParquetGroupScan.RowGroupScanFilterer<B extends AbstractParquetGroupScan.RowGroupScanFilterer<B>>This class is responsible for filtering different metadata levels including row group level.Nested classes/interfaces inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B extends AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B>> -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected List<ReadEntryWithPath> protected com.google.common.collect.ListMultimap<Integer, RowGroupInfo> protected ParquetReaderConfigprotected com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, RowGroupMetadata> Fields inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
columns, files, fileSet, filter, limit, matchAllMetadata, metadataProvider, nonInterestingColumnsMetadata, partitionColumns, partitions, segments, tableMetadata, usedMetastoreFields inherited from class org.apache.drill.exec.physical.base.AbstractBase
INIT_ALLOCATION, initialAllocation, MAX_ALLOCATION, maxAllocation, userNameFields inherited from interface org.apache.drill.exec.physical.base.GroupScan
ALL_COLUMNS -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedAbstractParquetGroupScan(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter) protected -
Method Summary
Modifier and TypeMethodDescriptionvoidapplyAssignments(List<CoordinationProtos.DrillbitEndpoint> incomingEndpoints) applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExprto current group scan and produces filtering at: table level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null segment level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null if segment metadata was pruned, prunes underlying metadata partition level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null if partition metadata was pruned, prunes underlying metadata file level: if filter matches all the the data or prunes all the data, sets corresponding value toAbstractGroupScanWithMetadata.isMatchAllMetadata()and returns nullapplyLimit(int maxRecords) By default, return null to indicate row count based prune is not supported.booleancanPushdownProjects(List<SchemaPath> columns) GroupScan should check the list of columns, and see if it could support all the columns in the list.protected abstract AbstractParquetGroupScancloneWithFileSelection(Collection<org.apache.hadoop.fs.Path> filePaths) protected abstract ParquetMetadataProviderBuilder<?> ReturnsTableMetadataProviderBuilderinstance which may provide metadata without using Drill Metastore.protected abstract Collection<CoordinationProtos.DrillbitEndpoint> Collection<org.apache.hadoop.fs.Path> getFiles()This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled byentriesfield.protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>> Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.intCalculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.protected List<RowGroupReadEntry> getReadEntries(int minorFragmentId) protected com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, RowGroupMetadata> voidmodifyFileSelection(FileSelection selection) protected static <T extends BaseMetadata & LocationProvider>
com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, T> pruneForPartitions(com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.protected com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, RowGroupMetadata> pruneRowGroupsForFiles(Map<org.apache.hadoop.fs.Path, FileMetadata> filteredFileMetadata) booleanChecks whether this group scan supports filter push down.Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
checkMetadataConsistency, getColumns, getColumnValueCount, getDigest, getFileSet, getFilesMetadata, getFilter, getFilterPredicate, getFilterPredicate, getFilterString, getLimit, getMetadataProvider, getNextOrEmpty, getNonInterestingColumnsMetadata, getPartitionColumns, getPartitionsMetadata, getPartitionValue, getPartitionValues, getScanStats, getSchema, getSegmentsMetadata, getTableMetadata, getTypeForColumn, hasFiles, init, isAllDataPruned, isGroupScanFullyMatchesFilter, isImplicitOrPartCol, isMatchAllMetadata, limitMetadata, pruneForPartitions, setFilter, setFilterForRuntime, supportsFileImplicitColumns, supportsLimitPushdown, tableMetadataProviderBuilder, usedMetastoreMethods inherited from class org.apache.drill.exec.physical.base.AbstractFileGroupScan
clone, supportsPartitionFilterPushdownMethods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScan
accept, clone, enforceWidth, getAnalyzeInfoProvider, getDistributionAffinity, getInitialAllocation, getMaxAllocation, getMinParallelizationWidth, getOperatorType, getScanStats, getScanStats, getSelectionRoot, isDistributed, isExecutable, iteratorMethods inherited from class org.apache.drill.exec.physical.base.AbstractBase
accept, getCost, getOperatorId, getSVMode, getUserName, isBufferedOperator, setCost, setMaxAllocation, setOperatorIdMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.drill.common.graph.GraphValue
acceptMethods inherited from interface org.apache.drill.exec.physical.base.GroupScan
clone, enforceWidth, getAnalyzeInfoProvider, getMinParallelizationWidth, getScanStats, getScanStats, getSelectionRoot, getSpecificScan, isDistributedMethods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinityMethods inherited from interface java.lang.Iterable
forEach, iterator, spliteratorMethods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId
-
Field Details
-
entries
-
rowGroups
-
mappings
-
readerConfig
-
-
Constructor Details
-
AbstractParquetGroupScan
protected AbstractParquetGroupScan(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter) -
AbstractParquetGroupScan
-
-
Method Details
-
getEntries
-
getReaderConfigForSerialization
-
getReaderConfig
-
getFiles
This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled byentriesfield.- Specified by:
getFilesin interfaceGroupScan- Overrides:
getFilesin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>- Returns:
- collection of files paths
-
canPushdownProjects
Description copied from interface:GroupScanGroupScan should check the list of columns, and see if it could support all the columns in the list.- Specified by:
canPushdownProjectsin interfaceGroupScan- Overrides:
canPushdownProjectsin classAbstractGroupScan
-
supportsFilterPushDown
public boolean supportsFilterPushDown()Description copied from interface:GroupScanChecks whether this group scan supports filter push down.- Specified by:
supportsFilterPushDownin interfaceGroupScan- Overrides:
supportsFilterPushDownin classAbstractGroupScan- Returns:
trueif this group scan supports filter push down,falseotherwise
-
getOperatorAffinity
Calculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.- Specified by:
getOperatorAffinityin interfaceHasAffinity- Overrides:
getOperatorAffinityin classAbstractGroupScan- Returns:
- a list of EndpointAffinity objects
-
applyAssignments
-
getMaxParallelizationWidth
public int getMaxParallelizationWidth() -
getReadEntries
-
applyFilter
public AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) Applies specified filterfilterExprto current group scan and produces filtering at:- table level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- segment level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null - if segment metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- partition level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null - if partition metadata was pruned, prunes underlying metadata
- if filter matches all the the data or prunes all the data, sets corresponding value to
- file level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- file metadata was pruned, prunes underlying metadata
- row group level:
- if filter matches all the the data or prunes all the data, sets corresponding value to
AbstractGroupScanWithMetadata.isMatchAllMetadata()and returns null
- if filter matches all the the data or prunes all the data, sets corresponding value to
- Specified by:
applyFilterin interfaceGroupScan- Overrides:
applyFilterin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>- Parameters:
filterExpr- filter expression to buildudfUtilities- udf utilitiesfunctionImplementationRegistry- context to find drill function holderoptionManager- option manager- Returns:
- group scan with applied filter expression
- table level:
-
pruneRowGroupsForFiles
protected com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> pruneRowGroupsForFiles(Map<org.apache.hadoop.fs.Path, FileMetadata> filteredFileMetadata) -
applyLimit
Description copied from class:AbstractGroupScanBy default, return null to indicate row count based prune is not supported. Each group scan subclass should override, if it supports row count based prune.- Specified by:
applyLimitin interfaceGroupScan- Overrides:
applyLimitin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>- Parameters:
maxRecords- : the number of rows requested from group scan.- Returns:
- a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
-
modifyFileSelection
- Specified by:
modifyFileSelectionin interfaceFileGroupScan- Overrides:
modifyFileSelectionin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
-
getRowGroupsMetadata
protected com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata() -
pruneForPartitions
protected static <T extends BaseMetadata & LocationProvider> com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> pruneForPartitions(com.google.common.collect.Multimap<org.apache.hadoop.fs.Path, T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata) Removes metadata which does not belong to any of partitions in metadata list.- Type Parameters:
T- type of metadata to filter- Parameters:
metadataToPrune- list of metadata which should be prunedfilteredPartitionMetadata- list of partition metadata which was pruned- Returns:
- list with metadata which belongs to pruned partitions
-
getDrillbits
-
cloneWithFileSelection
protected abstract AbstractParquetGroupScan cloneWithFileSelection(Collection<org.apache.hadoop.fs.Path> filePaths) throws IOException - Throws:
IOException
-
defaultTableMetadataProviderBuilder
protected abstract ParquetMetadataProviderBuilder<?> defaultTableMetadataProviderBuilder(MetadataProviderManager source) Description copied from class:AbstractGroupScanWithMetadataReturnsTableMetadataProviderBuilderinstance which may provide metadata without using Drill Metastore.- Specified by:
defaultTableMetadataProviderBuilderin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>- Parameters:
source- metadata provider manager- Returns:
TableMetadataProviderBuilderinstance
-
getFilterer
protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>> getFilterer()Description copied from class:AbstractGroupScanWithMetadataReturns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.- Specified by:
getFiltererin classAbstractGroupScanWithMetadata<ParquetMetadataProvider>
-