public class ManagedScanFramework extends Object implements ScanOperatorEvents
This framework is a bridge between operator logic and the scan projection internals. It gathers scan-specific options in a builder abstraction, then passes them on the scan orchestrator at the right time. By abstracting out this plumbing, a scan batch creator simply chooses the proper framework builder, passes config options, and implements the matching "managed reader" and factory. All details of setup, projection, and so on are handled by the framework and the components that the framework builds upon.
The framework achieves the work described below by composing a large set of detailed classes, each of which performs some specific task. This structure leaves the reader to simply infer schema and read data.
In particular, rather than do all the orchestration here (which would tie
that logic to the scan operation), the detailed work is delegated to the
ScanSchemaOrchestrator
class, with this class as a "shim" between
the the Scan events API and the schema orchestrator implementation.
It is important to note that the result set loader also defines a schema: the schema requested by the reader. If the reader wants to read three columns, a, b, and c, then that is the schema that the result set loader supports. This is true even if the query plan only wants column a, or wants columns c, a. The framework handles the projection task so the reader does not have to worry about it. Reading an unwanted column is low cost: the result set loader will have provided a "dummy" column writer that simply discards the value. This is just as fast as having the reader use if-statements or a table to determine which columns to save.
A reader may be "late schema", true "schema on read." In this case, the reader simply tells the result set loader to create a new column reader on the fly. The framework will work out if that new column is to be projected and will return either a real column writer (projected column) or a dummy column writer (unprojected column.)
Modifier and Type | Class and Description |
---|---|
static interface |
ManagedScanFramework.ReaderFactory
Creates a batch reader on demand.
|
static class |
ManagedScanFramework.ScanFrameworkBuilder |
Modifier and Type | Field and Description |
---|---|
protected ManagedScanFramework.ScanFrameworkBuilder |
builder |
protected OperatorContext |
context |
protected ManagedScanFramework.ReaderFactory |
readerFactory |
protected ScanSchemaOrchestrator |
scanOrchestrator |
Constructor and Description |
---|
ManagedScanFramework(ManagedScanFramework.ScanFrameworkBuilder builder) |
Modifier and Type | Method and Description |
---|---|
void |
bind(OperatorContext context)
Build the scan-level schema from the physical operator select list.
|
void |
close()
Called when the scan operator itself is closed.
|
protected void |
configure() |
OperatorContext |
context() |
CustomErrorContext |
errorContext() |
protected SchemaNegotiatorImpl |
newNegotiator() |
RowBatchReader |
nextReader()
A scanner typically readers multiple data sources (such as files or
file blocks.) A batch reader handles each read.
|
boolean |
open(ShimBatchReader shimBatchReader) |
TupleMetadata |
outputSchema() |
ScanSchemaOrchestrator |
scanOrchestrator() |
protected final ManagedScanFramework.ScanFrameworkBuilder builder
protected final ManagedScanFramework.ReaderFactory readerFactory
protected OperatorContext context
protected ScanSchemaOrchestrator scanOrchestrator
public ManagedScanFramework(ManagedScanFramework.ScanFrameworkBuilder builder)
public void bind(OperatorContext context)
ScanOperatorEvents
After this call, the schema manager should be ready to build a reader-specific schema for each reader as it is opened.
bind
in interface ScanOperatorEvents
context
- the operator context for the scan operatorpublic OperatorContext context()
public ScanSchemaOrchestrator scanOrchestrator()
public TupleMetadata outputSchema()
public CustomErrorContext errorContext()
protected void configure()
public RowBatchReader nextReader()
ScanOperatorEvents
The preferred implementation is to create each batch reader in this call to minimize resource usage. Production queries may read thousands of files or blocks, so incremental reader creation can be far more efficient than creating readers at the start of the scan.
nextReader
in interface ScanOperatorEvents
protected SchemaNegotiatorImpl newNegotiator()
public boolean open(ShimBatchReader shimBatchReader)
public void close()
ScanOperatorEvents
close
in interface ScanOperatorEvents
Copyright © 1970 The Apache Software Foundation. All rights reserved.