org.apache.drill.exec.store.parquet.AbstractParquetGroupScan

All Implemented Interfaces:: Iterable<PhysicalOperator>, GraphValue<PhysicalOperator>, FileGroupScan, FragmentLeaf, GroupScan, HasAffinity, Leaf, PhysicalOperator, Scan

Direct Known Subclasses:: DeltaGroupScan, HiveDrillNativeParquetScan, ParquetGroupScan

public abstract class AbstractParquetGroupScan extends AbstractGroupScanWithMetadata<ParquetMetadataProvider>

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

protected static class

AbstractParquetGroupScan.RowGroupScanFilterer<B extends AbstractParquetGroupScan.RowGroupScanFilterer<B>>

This class is responsible for filtering different metadata levels including row group level.

Nested classes/interfaces inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B extends AbstractGroupScanWithMetadata.GroupScanWithMetadataFilterer<B>>
Field Summary

Fields

Modifier and Type

Field

Description

protected List<ReadEntryWithPath>

entries

protected org.apache.drill.shaded.guava.com.google.common.collect.ListMultimap<Integer,RowGroupInfo>

mappings

protected ParquetReaderConfig

readerConfig

protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata>

rowGroups

Fields inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
columns, files, fileSet, filter, limit, matchAllMetadata, metadataProvider, nonInterestingColumnsMetadata, partitionColumns, partitions, segments, tableMetadata, usedMetastore

Fields inherited from class org.apache.drill.exec.physical.base.AbstractBase
INIT_ALLOCATION, initialAllocation, MAX_ALLOCATION, maxAllocation, userName

Fields inherited from interface org.apache.drill.exec.physical.base.GroupScan
ALL_COLUMNS
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

AbstractParquetGroupScan(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter)

protected

AbstractParquetGroupScan(AbstractParquetGroupScan that)
Method Summary

Modifier and Type

Method

Description

void

applyAssignments(List<CoordinationProtos.DrillbitEndpoint> incomingEndpoints)

AbstractGroupScanWithMetadata<?>

applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)

Applies specified filter filterExpr to current group scan and produces filtering at: table level: if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null segment level: if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null if segment metadata was pruned, prunes underlying metadata partition level: if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null if partition metadata was pruned, prunes underlying metadata file level: if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null

GroupScan

applyLimit(int maxRecords)

By default, return null to indicate row count based prune is not supported.

boolean

canPushdownProjects(List<SchemaPath> columns)

GroupScan should check the list of columns, and see if it could support all the columns in the list.

protected abstract AbstractParquetGroupScan

cloneWithFileSelection(Collection<org.apache.hadoop.fs.Path> filePaths)

protected abstract ParquetMetadataProviderBuilder<?>

defaultTableMetadataProviderBuilder(MetadataProviderManager source)

Returns TableMetadataProviderBuilder instance which may provide metadata without using Drill Metastore.

protected abstract Collection<CoordinationProtos.DrillbitEndpoint>

getDrillbits()

List<ReadEntryWithPath>

getEntries()

Collection<org.apache.hadoop.fs.Path>

getFiles()

This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled by entries field.

protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>>

getFilterer()

Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.

int

getMaxParallelizationWidth()

List<EndpointAffinity>

getOperatorAffinity()

Calculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.

protected List<RowGroupReadEntry>

getReadEntries(int minorFragmentId)

ParquetReaderConfig

getReaderConfig()

ParquetReaderConfig

getReaderConfigForSerialization()

protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata>

getRowGroupsMetadata()

void

modifyFileSelection(FileSelection selection)

protected static <T extends BaseMetadata & LocationProvider> org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T>

pruneForPartitions(org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata)

Removes metadata which does not belong to any of partitions in metadata list.

protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata>

pruneRowGroupsForFiles(Map<org.apache.hadoop.fs.Path,FileMetadata> filteredFileMetadata)

boolean

supportsFilterPushDown()

Checks whether this group scan supports filter push down.

Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata
checkMetadataConsistency, getColumns, getColumnValueCount, getDigest, getFileSet, getFilesMetadata, getFilter, getFilterPredicate, getFilterPredicate, getFilterString, getLimit, getMetadataProvider, getNextOrEmpty, getNonInterestingColumnsMetadata, getPartitionColumns, getPartitionsMetadata, getPartitionValue, getPartitionValues, getScanStats, getSchema, getSegmentsMetadata, getTableMetadata, getTypeForColumn, hasFiles, init, isAllDataPruned, isGroupScanFullyMatchesFilter, isImplicitOrPartCol, isMatchAllMetadata, limitMetadata, pruneForPartitions, setFilter, setFilterForRuntime, supportsFileImplicitColumns, supportsLimitPushdown, tableMetadataProviderBuilder, usedMetastore

Methods inherited from class org.apache.drill.exec.physical.base.AbstractFileGroupScan
clone, supportsPartitionFilterPushdown

Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScan
accept, clone, enforceWidth, getAnalyzeInfoProvider, getDistributionAffinity, getInitialAllocation, getMaxAllocation, getMinParallelizationWidth, getOperatorType, getScanStats, getScanStats, getSelectionRoot, isDistributed, isExecutable, iterator

Methods inherited from class org.apache.drill.exec.physical.base.AbstractBase
accept, getCost, getOperatorId, getSVMode, getUserName, isBufferedOperator, setCost, setMaxAllocation, setOperatorId

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.drill.common.graph.GraphValue
accept

Methods inherited from interface org.apache.drill.exec.physical.base.GroupScan
clone, enforceWidth, getAnalyzeInfoProvider, getMinParallelizationWidth, getScanStats, getScanStats, getSelectionRoot, getSpecificScan, isDistributed

Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity

Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator

Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId

Field Details
- entries
  
  protected List<ReadEntryWithPath> entries
- rowGroups
  
  protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> rowGroups
- mappings
  
  protected org.apache.drill.shaded.guava.com.google.common.collect.ListMultimap<Integer,RowGroupInfo> mappings
- readerConfig
  
  protected ParquetReaderConfig readerConfig
Constructor Details
- AbstractParquetGroupScan
  
  protected AbstractParquetGroupScan(String userName, List<SchemaPath> columns, List<ReadEntryWithPath> entries, ParquetReaderConfig readerConfig, LogicalExpression filter)
- AbstractParquetGroupScan
  
  protected AbstractParquetGroupScan(AbstractParquetGroupScan that)
Method Details
- getEntries
  
  public List<ReadEntryWithPath> getEntries()
- getReaderConfigForSerialization
  
  public ParquetReaderConfig getReaderConfigForSerialization()
- getReaderConfig
  
  public ParquetReaderConfig getReaderConfig()
- getFiles
  
  public Collection<org.apache.hadoop.fs.Path> getFiles()
  
  This method is excluded from serialization in this group scan since the actual files list to scan in this class is handled by entries field.
  
  Specified by:
  
  getFiles in interface GroupScan
  
  Overrides:
  
  getFiles in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>
  
  Returns:
  
  collection of files paths
- canPushdownProjects
  
  public boolean canPushdownProjects(List<SchemaPath> columns)
  
  Description copied from interface: GroupScan
  
  GroupScan should check the list of columns, and see if it could support all the columns in the list.
  
  Specified by:
  
  canPushdownProjects in interface GroupScan
  
  Overrides:
  
  canPushdownProjects in class AbstractGroupScan
- supportsFilterPushDown
  
  public boolean supportsFilterPushDown()
  
  Description copied from interface: GroupScan
  
  Checks whether this group scan supports filter push down.
  
  Specified by:
  
  supportsFilterPushDown in interface GroupScan
  
  Overrides:
  
  supportsFilterPushDown in class AbstractGroupScan
  
  Returns:
  
  true if this group scan supports filter push down, false otherwise
- getOperatorAffinity
  
  public List<EndpointAffinity> getOperatorAffinity()
  
  Calculates the affinity each endpoint has for this scan, by adding up the affinity each endpoint has for each rowGroup.
  
  Specified by:
  
  getOperatorAffinity in interface HasAffinity
  
  Overrides:
  
  getOperatorAffinity in class AbstractGroupScan
  
  Returns:
  
  a list of EndpointAffinity objects
- applyAssignments
  
  public void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> incomingEndpoints)
- getMaxParallelizationWidth
  
  public int getMaxParallelizationWidth()
- getReadEntries
  
  protected List<RowGroupReadEntry> getReadEntries(int minorFragmentId)
- applyFilter
  
  public AbstractGroupScanWithMetadata<?> applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
  Applies specified filter filterExpr to current group scan and produces filtering at:
  
  table level:
  if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null
  
  segment level:
  if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null
  
  if segment metadata was pruned, prunes underlying metadata
  
  partition level:
  if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null
  
  if partition metadata was pruned, prunes underlying metadata
  
  file level:
  if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null
  
  file metadata was pruned, prunes underlying metadata
  
  row group level:
  if filter matches all the the data or prunes all the data, sets corresponding value to AbstractGroupScanWithMetadata.isMatchAllMetadata() and returns null
  Specified by:
  
  applyFilter in interface GroupScan
  
  Overrides:
  
  applyFilter in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>
  
  Parameters:
  
  filterExpr - filter expression to build
  
  udfUtilities - udf utilities
  
  functionImplementationRegistry - context to find drill function holder
  
  optionManager - option manager
  
  Returns:
  
  group scan with applied filter expression
- pruneRowGroupsForFiles
  
  protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> pruneRowGroupsForFiles(Map<org.apache.hadoop.fs.Path,FileMetadata> filteredFileMetadata)
- applyLimit
  
  public GroupScan applyLimit(int maxRecords)
  
  Description copied from class: AbstractGroupScan
  
  By default, return null to indicate row count based prune is not supported. Each group scan subclass should override, if it supports row count based prune.
  
  Specified by:
  
  applyLimit in interface GroupScan
  
  Overrides:
  
  applyLimit in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>
  
  Parameters:
  
  maxRecords - : the number of rows requested from group scan.
  
  Returns:
  
  a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
- modifyFileSelection
  
  public void modifyFileSelection(FileSelection selection)
  
  Specified by:
  
  modifyFileSelection in interface FileGroupScan
  
  Overrides:
  
  modifyFileSelection in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>
- getRowGroupsMetadata
  
  protected org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,RowGroupMetadata> getRowGroupsMetadata()
- pruneForPartitions
  
  protected static <T extends BaseMetadata & LocationProvider> org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> pruneForPartitions(org.apache.drill.shaded.guava.com.google.common.collect.Multimap<org.apache.hadoop.fs.Path,T> metadataToPrune, List<PartitionMetadata> filteredPartitionMetadata)
  
  Removes metadata which does not belong to any of partitions in metadata list.
  
  Type Parameters:
  
  T - type of metadata to filter
  
  Parameters:
  
  metadataToPrune - list of metadata which should be pruned
  
  filteredPartitionMetadata - list of partition metadata which was pruned
  
  Returns:
  
  list with metadata which belongs to pruned partitions
- getDrillbits
  
  protected abstract Collection<CoordinationProtos.DrillbitEndpoint> getDrillbits()
- cloneWithFileSelection
  
  protected abstract AbstractParquetGroupScan cloneWithFileSelection(Collection<org.apache.hadoop.fs.Path> filePaths) throws IOException
  
  Throws:
  
  IOException
- defaultTableMetadataProviderBuilder
  
  protected abstract ParquetMetadataProviderBuilder<?> defaultTableMetadataProviderBuilder(MetadataProviderManager source)
  
  Description copied from class: AbstractGroupScanWithMetadata
  
  Returns TableMetadataProviderBuilder instance which may provide metadata without using Drill Metastore.
  
  Specified by:
  
  defaultTableMetadataProviderBuilder in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>
  
  Parameters:
  
  source - metadata provider manager
  
  Returns:
  
  TableMetadataProviderBuilder instance
- getFilterer
  
  protected abstract AbstractParquetGroupScan.RowGroupScanFilterer<? extends AbstractParquetGroupScan.RowGroupScanFilterer<?>> getFilterer()
  
  Description copied from class: AbstractGroupScanWithMetadata
  
  Returns holder for metadata values which provides API to filter metadata and build new group scan instance using filtered metadata.
  
  Specified by:
  
  getFilterer in class AbstractGroupScanWithMetadata<ParquetMetadataProvider>

Class AbstractParquetGroupScan

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata

Field Summary

Fields inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata

Fields inherited from class org.apache.drill.exec.physical.base.AbstractBase

Fields inherited from interface org.apache.drill.exec.physical.base.GroupScan

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScanWithMetadata

Methods inherited from class org.apache.drill.exec.physical.base.AbstractFileGroupScan

Methods inherited from class org.apache.drill.exec.physical.base.AbstractGroupScan

Methods inherited from class org.apache.drill.exec.physical.base.AbstractBase

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.drill.common.graph.GraphValue

Methods inherited from interface org.apache.drill.exec.physical.base.GroupScan

Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity

Methods inherited from interface java.lang.Iterable

Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator

Field Details

entries

rowGroups

mappings

readerConfig

Constructor Details

AbstractParquetGroupScan

AbstractParquetGroupScan

Method Details

getEntries

getReaderConfigForSerialization

getReaderConfig

getFiles

canPushdownProjects

supportsFilterPushDown

getOperatorAffinity

applyAssignments

getMaxParallelizationWidth

getReadEntries

applyFilter

pruneRowGroupsForFiles

applyLimit

modifyFileSelection

getRowGroupsMetadata

pruneForPartitions

getDrillbits

cloneWithFileSelection

defaultTableMetadataProviderBuilder

getFilterer