public class ScanOperatorExec extends Object implements OperatorExec
ScanBatch
and should be used
by all new scan implementations.
The basic concept is to split the scan operator into layers:
OperatorRecordBatch
which implements Drill's Volcano-like
protocol.The layered format can be confusing. However, each layer is somewhat complex, so dividing the work into layers keeps the overall complexity somewhat under control.
The scan operator itself is simply a framework for handling a set of readers; it knows nothing other than the interfaces of the components it works with; delegating all knowledge of schemas, projection, reading and the like to implementations of those interfaces. Because that work is complex, a set of frameworks exist to handle most common use cases, but a specialized reader can create a framework or reader from scratch.
Error handling in this class is minimal: the enclosing record batch iterator is responsible for handling exceptions. Error handling relies on the fact that the iterator will call close() regardless of which exceptions are thrown.
The ScanOperatorEvents
implementation provides the set of readers to
use. This class can simply maintain a list, or can create the reader on
demand.
More subtly, the factory also handles projection issues and manages vectors across subsequent readers. A number of factories are available for the most common cases. Extend these to implement a version specific to a data source.
The RowBatchReader
is a surprisingly minimal interface that
nonetheless captures the essence of reading a result set as a set of batches.
The factory implementations mentioned above implement this interface to provide
commonly-used services, the most important of which is access to a
{#link ResultSetLoader} to write values into value vectors.
Readers can discover columns as they read data, such as with any JSON-based format. In this case, the row set mutator also provides a schema version, but a fine-grained one that changes each time a column is added.
The two schema versions serve different purposes and are not interchangeable. For example, if a scan reads two files, both will build up their own schemas, each increasing its internal version number as work proceeds. But, at the end of each batch, the schemas may (and, in fact, should) be identical, which is the schema version downstream operators care about.
SELECT * FROM VALUES()
Modifier and Type | Field and Description |
---|---|
protected VectorContainerAccessor |
containerAccessor |
protected OperatorContext |
context |
Constructor and Description |
---|
ScanOperatorExec(ScanOperatorEvents factory,
boolean allowEmptyResult) |
Modifier and Type | Method and Description |
---|---|
BatchAccessor |
batchAccessor()
Provides a generic access mechanism to the batch's output data.
|
void |
bind(OperatorContext context)
Bind this operator to the context.
|
boolean |
buildSchema()
Retrieves the schema of the batch before the first actual batch
of data.
|
void |
cancel()
Alerts the operator that the query was cancelled.
|
void |
close()
Close the operator by releasing all resources that the operator
held.
|
OperatorContext |
context() |
boolean |
next()
Retrieves the next batch of data.
|
protected final VectorContainerAccessor containerAccessor
protected OperatorContext context
public ScanOperatorExec(ScanOperatorEvents factory, boolean allowEmptyResult)
public void bind(OperatorContext context)
OperatorExec
bind
in interface OperatorExec
context
- operator contextpublic BatchAccessor batchAccessor()
OperatorExec
OperatorExec.buildSchema()
and OperatorExec.next()
. The batch itself
can be held in a standard VectorContainer
, or in some
other structure more convenient for this operator.batchAccessor
in interface OperatorExec
public OperatorContext context()
public boolean buildSchema()
OperatorExec
OperatorExec.batchAccessor()
.buildSchema
in interface OperatorExec
public boolean next()
OperatorExec
OperatorExec.batchAccessor()
method.next
in interface OperatorExec
public void cancel()
OperatorExec
cancel
in interface OperatorExec
public void close()
OperatorExec
OperatorExec.cancel()
and after OperatorExec.batchAccessor()
or OperatorExec.next()
returns false.
Note that there may be a significant delay between the last call to next() and the call to close() during which downstream operators do their work. A tidy operator will release resources immediately after EOF to avoid holding onto memory or other resources that could be used by downstream operators.
close
in interface OperatorExec
Copyright © 1970 The Apache Software Foundation. All rights reserved.