public abstract class AbstractSchemaTracker extends Object implements ScanSchemaTracker
ScanSchemaTracker.ProjectionType
Modifier and Type | Field and Description |
---|---|
protected CustomErrorContext |
errorContext |
protected boolean |
isResolved |
protected MutableTupleSchema |
schema |
Constructor and Description |
---|
AbstractSchemaTracker(CustomErrorContext errorContext) |
Modifier and Type | Method and Description |
---|---|
TupleMetadata |
applyImplicitCols()
Indicate that implicit column parsing is complete.
|
protected void |
checkResolved()
Determine if the schema is resolved.
|
CustomErrorContext |
errorContext()
The scan-level error context used for errors which may occur before the
first reader starts.
|
MutableTupleSchema |
internalSchema()
Returns the internal scan schema.
|
boolean |
isResolved()
Is the scan schema resolved? The schema is resolved depending on the
complex lifecycle explained in the class comment.
|
TupleMetadata |
missingColumns(TupleMetadata readerOutputSchema)
Identifies the missing columns given a reader output schema.
|
TupleMetadata |
outputSchema()
Returns the scan output schema which is a somewhat complicated
computation that depends on the projection type.
|
ScanSchemaTracker.ProjectionType |
projectionType() |
TupleMetadata |
readerInputSchema()
The schema which the reader should produce.
|
void |
resolveMissingCols(TupleMetadata missingCols)
The missing column handler obtains the list of missing columns from
#missingColumns() . |
int |
schemaVersion()
Gives the output schema version which will start at some arbitrary
positive number.
|
protected static void |
validateProjection(TupleMetadata projection,
TupleMetadata schema)
Validate a projection list against a defined-schema tuple.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
applyEarlyReaderSchema, applyReaderSchema, columnProjection, expandImplicitCol, projectionFilter
protected final CustomErrorContext errorContext
protected final MutableTupleSchema schema
protected boolean isResolved
public AbstractSchemaTracker(CustomErrorContext errorContext)
protected static void validateProjection(TupleMetadata projection, TupleMetadata schema)
projection
- the parsed projection listschema
- the defined schema to validate againstpublic ScanSchemaTracker.ProjectionType projectionType()
projectionType
in interface ScanSchemaTracker
public CustomErrorContext errorContext()
ScanSchemaTracker
errorContext
in interface ScanSchemaTracker
public MutableTupleSchema internalSchema()
ScanSchemaTracker
internalSchema
in interface ScanSchemaTracker
public boolean isResolved()
ScanSchemaTracker
The schema will be fully resolved after the first batch of data arrives from a reader (since the reader lifecycle will then fill in any missing columns.) The schema may be resolved sooner (such as if a strict provided schema, or an early reader schema is available and there are no missing columns.)
isResolved
in interface ScanSchemaTracker
ScanSchemaTracker.outputSchema()
is available, false
if the schema
contains one or more dynamic columns which are not yet resolved.public int schemaVersion()
ScanSchemaTracker
If schema change is allowed, the schema version allows detecting schema changes as the scan schema moves from one resolved state to the next. Each schema will have a unique, increasing version number. A schema change has occurred if the version is newer than the previous output schema version.
schemaVersion
in interface ScanSchemaTracker
protected void checkResolved()
SELECT *
case, we require at least one column, which means
that something (provided schema, early reader schema) has provided
us with a schema. Once resolved, a schema can never become
unresolved: readers are not allowed to add dynamic columns.public TupleMetadata applyImplicitCols()
ScanSchemaTracker
applyImplicitCols
in interface ScanSchemaTracker
public TupleMetadata readerInputSchema()
ScanSchemaTracker
#isProjectAll()
is true
),
the reader may produce additional columns beyond those in the the
reader input schema. However, for any batch, the reader, plus the
missing columns handler, must produce all columns in the reader input
schema.
Formally:
reader input schema = output schema - implicit col schema
readerInputSchema
in interface ScanSchemaTracker
public TupleMetadata missingColumns(TupleMetadata readerOutputSchema)
ScanSchemaTracker
Formally:
missing cols = reader input schema - reader output schema
The reader output schema can contain extra, newly discovered columns. Those are ignored when computing missing columns. Thus, the subtraction is set subtraction: remove columns common to the two sets.
missingColumns
in interface ScanSchemaTracker
public void resolveMissingCols(TupleMetadata missingCols)
ScanSchemaTracker
#missingColumns()
. Depending on the scan lifecycle, some of the
columns may have a type, others may be dynamic. The missing column handler
chooses a type for any dynamic columns, then calls this method to tell
the scan schema tracker the now-resolved column type.
Note: a goal of the provided/defined schema system is to avoid the need to guess types for missing columns since doing so quite often leads to problems further downstream in the query. Ideally, the type of missing columns will be known (via the provided or defined schema) to avoid such conflicts.
resolveMissingCols
in interface ScanSchemaTracker
public TupleMetadata outputSchema()
ScanSchemaTracker
For a wildcard schema:
output schema = implicit cols U reader output schema
For an explicit projection:
output schema = projection list
Where the projection list is augmented by types from the
provided schema, implicit columns or readers.
A defined schema is the output schema, so:
output schema = defined schema
outputSchema
in interface ScanSchemaTracker
Copyright © 1970 The Apache Software Foundation. All rights reserved.