public class ScanSchemaOrchestrator extends Object
Provides the option to continue a schema from one batch to the next. This can reduce spurious schema changes in formats, such as JSON, with varying fields. It is not, however, a complete solution as the outcome still depends on the order of file scans and the division of files across readers.
Provides the option to infer the schema from the first batch. The "quick path" to obtain the schema will read one batch, then use that schema as the returned schema, returning the full batch in the next call to next().
[ 1 | 2 | 3 | 4 ] Table columns in table order
[ A | B | C ] Static columns
Now, we wish to project them into select order.
Let's say that the SELECT clause looked like this, with "t"
indicating table columns:
SELECT t2, t3, C, B, t1, A, t2 ...
Then the projection looks like this:
[ 2 | 3 | C | B | 1 | A | 2 ]
Often, not all table columns are projected. In this case, the
result set loader presents the full table schema to the reader,
but actually writes only the projected columns. Suppose we
have:
SELECT t3, C, B, t1,, A ...
Then the abbreviated table schema looks like this:
[ 1 | 3 ]
Note that table columns retain their table ordering.
The projection looks like this:
[ 2 | C | B | 1 | A ]
The projector is created once per schema, then can be reused for any number of batches.
Merging is done in one of two ways, depending on the input source:
Modifier and Type | Class and Description |
---|---|
static class |
ScanSchemaOrchestrator.ScanOrchestratorBuilder |
static class |
ScanSchemaOrchestrator.ScanSchemaOptions |
Modifier and Type | Field and Description |
---|---|
protected BufferAllocator |
allocator |
protected int |
batchCount |
static int |
DEFAULT_BATCH_BYTE_COUNT |
static int |
DEFAULT_BATCH_ROW_COUNT |
static int |
MAX_BATCH_BYTE_SIZE |
static int |
MAX_BATCH_ROW_COUNT |
MetadataManager |
metadataManager
Creates the metadata (file and directory) columns, if needed.
|
static int |
MIN_BATCH_BYTE_SIZE |
protected ScanSchemaOrchestrator.ScanSchemaOptions |
options |
protected VectorContainer |
outputContainer |
protected long |
rowCount |
protected ScanLevelProjection |
scanProj |
protected SchemaSmoother |
schemaSmoother |
protected ResultVectorCacheImpl |
vectorCache
Cache used to preserve the same vectors from one output batch to the
next to keep the Project operator happy (which depends on exactly the
same vectors.
|
Constructor and Description |
---|
ScanSchemaOrchestrator(BufferAllocator allocator,
ScanSchemaOrchestrator.ScanOrchestratorBuilder builder) |
Modifier and Type | Method and Description |
---|---|
boolean |
atLimit() |
void |
close() |
void |
closeReader() |
boolean |
hasSchema() |
boolean |
isProjectNone() |
VectorContainer |
output() |
TupleMetadata |
providedSchema()
Returns the provided reader schema.
|
ReaderSchemaOrchestrator |
startReader() |
void |
tallyBatch(int rowCount) |
public static final int MIN_BATCH_BYTE_SIZE
public static final int MAX_BATCH_BYTE_SIZE
public static final int DEFAULT_BATCH_ROW_COUNT
public static final int DEFAULT_BATCH_BYTE_COUNT
public static final int MAX_BATCH_ROW_COUNT
protected final BufferAllocator allocator
protected final ScanSchemaOrchestrator.ScanSchemaOptions options
public final MetadataManager metadataManager
protected final ResultVectorCacheImpl vectorCache
If the Project operator ever changes so that it depends on looking up vectors rather than vector instances, this cache can be deprecated.
protected final ScanLevelProjection scanProj
protected final SchemaSmoother schemaSmoother
protected int batchCount
protected long rowCount
protected VectorContainer outputContainer
public ScanSchemaOrchestrator(BufferAllocator allocator, ScanSchemaOrchestrator.ScanOrchestratorBuilder builder)
public ReaderSchemaOrchestrator startReader()
public boolean isProjectNone()
public boolean hasSchema()
public TupleMetadata providedSchema()
public VectorContainer output()
public void tallyBatch(int rowCount)
public boolean atLimit()
public void closeReader()
public void close()
Copyright © 1970 The Apache Software Foundation. All rights reserved.