public interface SchemaNegotiator
Regardless of the schema type, the result of building the schema is a result set loader used to prepare batches for use in the query. The reader can simply read all columns, allowing the framework to discard unwanted values. Or for efficiency, the reader can check the column metadata to determine if a column is projected, and if not, then don't even read the column from the input source.
hasOutputSchema()
.
At present, the scan framework filters the "provided schema" against the project list so that this class presents only the actual output schema. Future versions may do the filtering in the planner, but the result for readers will be the same either way.
The reader and scan framework coordinate to form the output schema. The reader offers the columns it has available. The scan framework uses the projection list to decide which to accept. Either way the scan framework provides a column reader for the column (returning a do-nothing "dummy" reader if the column is unprojected.)
With a dynamic schema, readers offer a schema in one of two ways:
The reader provides the table schema in one of two ways: early schema or late schema. Either way, the project list from the physical plan determines which table columns are materialized and which are not. Readers are provided for all table columns for readers that must read sequentially, but only the materialized columns are written to value vectors.
#tableSchema(TupleMetadata)
to provide the known schema.
The late schema reader calls RowSetLoader#addColumn()
to
add each column as it is discovered during the scan.
Note that, to avoid schema conflicts, a late schema reader must define the full set of columns in the first batch, and must stick to that schema for all subsequent batches. This allows the reader to look one batch ahead to learn the columns.
Drill, however, cannot predict the future. Without a defined schema, downstream operators cannot know which columns might appear later in the scan, with which types. Today this is a strong guideline. Future versions may enforce this rule.
Modifier and Type | Method and Description |
---|---|
void |
batchSize(int maxRecordsPerBatch)
Set the preferred batch size (which may be overridden by the
result set loader in order to limit vector or batch size.)
|
ResultSetLoader |
build()
Build the schema, plan the required projections and static
columns and return a loader used to populate value vectors.
|
OperatorContext |
context() |
com.typesafe.config.Config |
drillConfig() |
boolean |
hasProvidedSchema()
Report if the execution plan defines a provided schema.
|
boolean |
isProjectionEmpty()
Report whether the projection list is empty, as occurs in two
cases:
SELECT COUNT(*) ... -- empty project.
SELECT a, b FROM table(c d) -- disjoint project.
|
void |
limit(long limit)
Push down a LIMIT into the scan.
|
CustomErrorContext |
parentErrorContext()
The context to use as a parent when creating a custom context.
|
TupleMetadata |
providedSchema()
Returns the provided schema, if defined.
|
OptionSet |
queryOptions() |
void |
setErrorContext(CustomErrorContext context)
Specify an advanced error context which allows the reader to
fill in custom context values.
|
void |
tableSchema(TupleMetadata schema,
boolean isComplete)
Specify the table schema if this is an early-schema reader.
|
String |
userName()
Name of the user running the query.
|
OperatorContext context()
com.typesafe.config.Config drillConfig()
OptionSet queryOptions()
void setErrorContext(CustomErrorContext context)
String userName()
boolean hasProvidedSchema()
true
if the execution plan defines the output
schema, false
if the schema should be computed dynamically
from the source schema and column projectionsTupleMetadata providedSchema()
hasProvidedSchema()
returns
true
, null
otherwisevoid tableSchema(TupleMetadata schema, boolean isComplete)
Should only be called if the schema is dynamic, that is, if
hasProvidedSchema()
returns false.
schema
- the table schema if known at open timeisComplete
- true if the schema is complete: if it can be used
to define an empty schema-only batch for the first reader. Set to
false if the schema is partial: if the reader must read rows to
determine the full schemavoid batchSize(int maxRecordsPerBatch)
maxRecordsPerBatch
- preferred number of record per batchvoid limit(long limit)
ResultSetLoader build()
boolean isProjectionEmpty()
ResultSetLoader.skipRows(int)
to indicate the
row count, false if at least one column is projected and so
data must be written using the loaderCustomErrorContext parentErrorContext()
(Obtain the error context for this reader from the
ResultSetLoader
.
Copyright © 1970 The Apache Software Foundation. All rights reserved.