Interface GroupScan

All Superinterfaces:
FragmentLeaf, GraphValue<PhysicalOperator>, HasAffinity, Iterable<PhysicalOperator>, Leaf, PhysicalOperator, Scan
All Known Subinterfaces:
DbGroupScan, FileGroupScan, IndexGroupScan
All Known Implementing Classes:
AbstractDbGroupScan, AbstractFileGroupScan, AbstractGroupScan, AbstractGroupScanWithMetadata, AbstractParquetGroupScan, DeltaGroupScan, DirectGroupScan, DrillGroupScan, DruidGroupScan, EasyGroupScan, EnumerableGroupScan, GoogleSheetsGroupScan, HBaseGroupScan, HiveDrillNativeParquetScan, HiveScan, HttpGroupScan, IcebergGroupScan, InfoSchemaGroupScan, JdbcGroupScan, KafkaGroupScan, KuduGroupScan, MetadataDirectGroupScan, MockGroupScanPOP, MongoGroupScan, OpenTSDBGroupScan, ParquetGroupScan, PhoenixGroupScan, SchemalessScan, SplunkGroupScan, SystemTableScan

public interface GroupScan extends Scan, HasAffinity
A GroupScan operator represents all data which will be scanned by a given physical plan. It is the superset of all SubScans for the plan.
  • Field Details

    • ALL_COLUMNS

      static final List<SchemaPath> ALL_COLUMNS
      columns list in GroupScan : 1) empty_column is for skipAll query. 2) NULL is interpreted as ALL_COLUMNS. How to handle skipAll query is up to each storage plugin, with different policy in corresponding RecordReader.
  • Method Details

    • applyAssignments

      Throws:
      PhysicalOperatorSetupException
    • getSpecificScan

      SubScan getSpecificScan(int minorFragmentId) throws ExecutionSetupException
      Throws:
      ExecutionSetupException
    • getMaxParallelizationWidth

      int getMaxParallelizationWidth()
    • isDistributed

      boolean isDistributed()
    • getMinParallelizationWidth

      int getMinParallelizationWidth()
      At minimum, the GroupScan requires these many fragments to run. Currently, this is used in SimpleParallelizer
      Returns:
      the minimum number of fragments that should run
    • enforceWidth

      @Deprecated boolean enforceWidth()
      Deprecated.
      Use getMinParallelizationWidth() to determine whether this GroupScan spans more than one fragment.
      Check if GroupScan enforces width to be maximum parallelization width. Currently, this is used in ExcessiveExchangeIdentifier
      Returns:
      if maximum width should be enforced
    • getDigest

      String getDigest()
      Returns a signature of the GroupScan which should usually be composed of all its attributes which could describe it uniquely.
    • getScanStats

      ScanStats getScanStats(PlannerSettings settings)
    • getScanStats

      ScanStats getScanStats(org.apache.calcite.rel.metadata.RelMetadataQuery mq)
    • clone

      GroupScan clone(List<SchemaPath> columns)
      Returns a clone of GroupScan instance, except that the new GroupScan will use the provided list of columns .
    • canPushdownProjects

      boolean canPushdownProjects(List<SchemaPath> columns)
      GroupScan should check the list of columns, and see if it could support all the columns in the list.
    • getColumnValueCount

      long getColumnValueCount(SchemaPath column)
      Return the number of non-null value in the specified column. Raise exception, if groupscan does not have exact column row count.
    • supportsPartitionFilterPushdown

      boolean supportsPartitionFilterPushdown()
      Whether or not this GroupScan supports pushdown of partition filters (directories for filesystems)
    • getColumns

      List<SchemaPath> getColumns()
      Returns a list of columns scanned by this group scan
    • getPartitionColumns

      List<SchemaPath> getPartitionColumns()
      Returns a list of columns that can be used for partition pruning
    • supportsLimitPushdown

      boolean supportsLimitPushdown()
      Whether or not this GroupScan supports limit pushdown
    • applyLimit

      GroupScan applyLimit(int maxRecords)
      Apply rowcount based prune for "LIMIT n" query.
      Parameters:
      maxRecords - : the number of rows requested from group scan.
      Returns:
      a new instance of group scan if the prune is successful. null when either if row-based prune is not supported, or if prune is not successful.
    • hasFiles

      boolean hasFiles()
      Return true if this GroupScan can return its selection as a list of file names (retrieved by getFiles()).
    • getSelectionRoot

      org.apache.hadoop.fs.Path getSelectionRoot()
      Returns path to the selection root. If this GroupScan cannot provide selection root, it returns null.
      Returns:
      path to the selection root
    • getFiles

      Collection<org.apache.hadoop.fs.Path> getFiles()
      Returns a collection of file names associated with this GroupScan. This should be called after checking hasFiles(). If this GroupScan cannot provide file names, it returns null.
      Returns:
      collection of files paths
    • getFilter

      LogicalExpression getFilter()
    • applyFilter

      GroupScan applyFilter(LogicalExpression filterExpr, UdfUtilities udfUtilities, FunctionImplementationRegistry functionImplementationRegistry, OptionManager optionManager)
    • getMetadataProvider

      TableMetadataProvider getMetadataProvider()
      Returns TableMetadataProvider instance which is used for providing metadata for current GroupScan.
      Returns:
      TableMetadataProvider instance the source of metadata
    • getTableMetadata

      TableMetadata getTableMetadata()
    • usedMetastore

      boolean usedMetastore()
      Returns true if current group scan uses metadata obtained from the Metastore.
      Returns:
      true if current group scan uses metadata obtained from the Metastore, false otherwise.
    • getAnalyzeInfoProvider

      AnalyzeInfoProvider getAnalyzeInfoProvider()
      Returns AnalyzeInfoProvider instance which will be used when running ANALYZE statement.
      Returns:
      AnalyzeInfoProvider instance
    • supportsFilterPushDown

      boolean supportsFilterPushDown()
      Checks whether this group scan supports filter push down.
      Returns:
      true if this group scan supports filter push down, false otherwise