Interface GroupScan
- All Superinterfaces:
FragmentLeaf
,GraphValue<PhysicalOperator>
,HasAffinity
,Iterable<PhysicalOperator>
,Leaf
,PhysicalOperator
,Scan
- All Known Subinterfaces:
DbGroupScan
,FileGroupScan
,IndexGroupScan
- All Known Implementing Classes:
AbstractDbGroupScan
,AbstractFileGroupScan
,AbstractGroupScan
,AbstractGroupScanWithMetadata
,AbstractParquetGroupScan
,DeltaGroupScan
,DirectGroupScan
,DrillGroupScan
,DruidGroupScan
,EasyGroupScan
,EnumerableGroupScan
,GoogleSheetsGroupScan
,HBaseGroupScan
,HiveDrillNativeParquetScan
,HiveScan
,HttpGroupScan
,IcebergGroupScan
,InfoSchemaGroupScan
,JdbcGroupScan
,KafkaGroupScan
,KuduGroupScan
,MetadataDirectGroupScan
,MockGroupScanPOP
,MongoGroupScan
,OpenTSDBGroupScan
,ParquetGroupScan
,PhoenixGroupScan
,SchemalessScan
,SplunkGroupScan
,SystemTableScan
A GroupScan operator represents all data which will be scanned by a given physical
plan. It is the superset of all SubScans for the plan.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final List<SchemaPath>
columns list in GroupScan : 1) empty_column is for skipAll query. -
Method Summary
Modifier and TypeMethodDescriptionvoid
applyAssignments
(List<CoordinationProtos.DrillbitEndpoint> endpoints) applyFilter
(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) applyLimit
(int maxRecords) Apply rowcount based prune for "LIMIT n" query.boolean
canPushdownProjects
(List<SchemaPath> columns) GroupScan should check the list of columns, and see if it could support all the columns in the list.clone
(List<SchemaPath> columns) Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns .boolean
Deprecated.ReturnsAnalyzeInfoProvider
instance which will be used when running ANALYZE statement.Returns a list of columns scanned by this group scanlong
getColumnValueCount
(SchemaPath column) Return the number of non-null value in the specified column.Returns a signature of theGroupScan
which should usually be composed of all its attributes which could describe it uniquely.Collection<org.apache.hadoop.fs.Path>
getFiles()
Returns a collection of file names associated with this GroupScan.int
ReturnsTableMetadataProvider
instance which is used for providing metadata for currentGroupScan
.int
At minimum, the GroupScan requires these many fragments to run.Returns a list of columns that can be used for partition pruninggetScanStats
(org.apache.calcite.rel.metadata.RelMetadataQuery mq) getScanStats
(PlannerSettings settings) org.apache.hadoop.fs.Path
Returns path to the selection root.getSpecificScan
(int minorFragmentId) boolean
hasFiles()
Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).boolean
boolean
Checks whether this group scan supports filter push down.boolean
Whether or not this GroupScan supports limit pushdownboolean
Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems)boolean
Returnstrue
if current group scan uses metadata obtained from the Metastore.Methods inherited from interface org.apache.drill.common.graph.GraphValue
accept
Methods inherited from interface org.apache.drill.exec.physical.base.HasAffinity
getDistributionAffinity, getOperatorAffinity
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
Methods inherited from interface org.apache.drill.exec.physical.base.PhysicalOperator
accept, getCost, getInitialAllocation, getMaxAllocation, getNewWithChildren, getOperatorId, getOperatorType, getSVMode, getUserName, isBufferedOperator, isExecutable, setCost, setMaxAllocation, setOperatorId
-
Field Details
-
ALL_COLUMNS
columns list in GroupScan : 1) empty_column is for skipAll query. 2) NULL is interpreted as ALL_COLUMNS. How to handle skipAll query is up to each storage plugin, with different policy in corresponding RecordReader.
-
-
Method Details
-
applyAssignments
void applyAssignments(List<CoordinationProtos.DrillbitEndpoint> endpoints) throws PhysicalOperatorSetupException - Throws:
PhysicalOperatorSetupException
-
getSpecificScan
- Throws:
ExecutionSetupException
-
getMaxParallelizationWidth
int getMaxParallelizationWidth() -
isDistributed
boolean isDistributed() -
getMinParallelizationWidth
int getMinParallelizationWidth()At minimum, the GroupScan requires these many fragments to run. Currently, this is used inSimpleParallelizer
- Returns:
- the minimum number of fragments that should run
-
enforceWidth
Deprecated.UsegetMinParallelizationWidth()
to determine whether this GroupScan spans more than one fragment.Check if GroupScan enforces width to be maximum parallelization width. Currently, this is used inExcessiveExchangeIdentifier
- Returns:
- if maximum width should be enforced
-
getDigest
String getDigest()Returns a signature of theGroupScan
which should usually be composed of all its attributes which could describe it uniquely. -
getScanStats
-
getScanStats
-
clone
Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns . -
canPushdownProjects
GroupScan should check the list of columns, and see if it could support all the columns in the list. -
getColumnValueCount
Return the number of non-null value in the specified column. Raise exception, if groupscan does not have exact column row count. -
supportsPartitionFilterPushdown
boolean supportsPartitionFilterPushdown()Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems) -
getColumns
List<SchemaPath> getColumns()Returns a list of columns scanned by this group scan -
getPartitionColumns
List<SchemaPath> getPartitionColumns()Returns a list of columns that can be used for partition pruning -
supportsLimitPushdown
boolean supportsLimitPushdown()Whether or not this GroupScan supports limit pushdown -
applyLimit
Apply rowcount based prune for "LIMIT n" query.- Parameters:
maxRecords
- : the number of rows requested from group scan.- Returns:
- a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
-
hasFiles
boolean hasFiles()Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()). -
getSelectionRoot
org.apache.hadoop.fs.Path getSelectionRoot()Returns path to the selection root. If this GroupScan cannot provide selection root, it returns null.- Returns:
- path to the selection root
-
getFiles
Collection<org.apache.hadoop.fs.Path> getFiles()Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.- Returns:
- collection of files paths
-
getFilter
LogicalExpression getFilter() -
applyFilter
GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager) -
getMetadataProvider
TableMetadataProvider getMetadataProvider()ReturnsTableMetadataProvider
instance which is used for providing metadata for currentGroupScan
.- Returns:
TableMetadataProvider
instance the source of metadata
-
getTableMetadata
TableMetadata getTableMetadata() -
usedMetastore
boolean usedMetastore()Returnstrue
if current group scan uses metadata obtained from the Metastore.- Returns:
true
if current group scan uses metadata obtained from the Metastore,false
otherwise.
-
getAnalyzeInfoProvider
AnalyzeInfoProvider getAnalyzeInfoProvider()ReturnsAnalyzeInfoProvider
instance which will be used when running ANALYZE statement.- Returns:
AnalyzeInfoProvider
instance
-
supportsFilterPushDown
boolean supportsFilterPushDown()Checks whether this group scan supports filter push down.- Returns:
true
if this group scan supports filter push down,false
otherwise
-
getMinParallelizationWidth()
to determine whether this GroupScan spans more than one fragment.