public interface ResultSetCopier
Works to create full output batches to minimize per-batch overhead and to eliminate unnecessary empty batches if no rows are copied.
The output batches are assumed to have the same schema as input batches. (No projection occurs.) The output schema will change each time the input schema changes. (For an SV4, then the upstream operator must have ensured all batches covered by the SV4 have the same schema.)
This implementation works with a single stream of batches which, following Drill's rules, must consist of the same set of vectors on each non-schema-change batch.
ResultSetCopierImpl
class, passing the input row set reader
to the constructor.BatchIterator.next()
method.close()
.To build each output batch:
public IterOutcome next() {
copier.startOutputBatch();
while (!copier.isFull() {
IterOutcome innerResult = inner.next();
if (innerResult == DONE) { break; }
copier.startInputBatch();
copier.copyAllRows();
}
if (copier.hasRows()) {
outputContainer = copier.harvest();
return outputContainer.isSchemaChanged() ? OK_NEW_SCHEMA ? OK;
} else { return DONE; }
}
The above assumes that the upstream operator can be polled multiple times in the DONE state. The extra polling is needed to handle any in-flight copies when the input exhausts its batches.
The above also shows that the copier handles and reports schema changes by setting the schema change flag in the output container. Real code must handle multiple calls to next() in the DONE state, and work around lack of such support in its input (perhaps by tracking a state.)
An input batch is processed by copying the rows. Copying can be done row-by row, via a row range, or by copying the entire input batch as shown in the example. Copying the entire batch make sense when the input batch carries as selection vector that identifies which rows to copy, in which order.
Because we wish to fill the output batch, we may be able to copy part of a batch, the whole batch, or multiple batches to the output.
Modifier and Type | Method and Description |
---|---|
void |
close()
Release resources, including any pending input batch
and any non-harvested output batch.
|
void |
copyAllRows()
Copy all (remaining) input rows to the output.
|
boolean |
copyNextRow()
If copying rows one by one, copy the next row from the
input.
|
void |
copyRow(int inputRowIndex)
Copy a row at the given position.
|
VectorContainer |
harvest()
Obtain the output batch.
|
boolean |
hasOutputRows()
Reports if the output batch has rows.
|
boolean |
isCopyPending()
Helper method to determine if a copy is pending: more rows
remain to be copied.
|
boolean |
isOutputFull()
Reports if the output batch is full and must be sent
downstream.
|
boolean |
nextInputBatch()
Start the next input batch.
|
void |
startOutputBatch()
Start the next output batch.
|
void startOutputBatch()
boolean nextInputBatch()
ResultSetReader
passed into the constructor.boolean copyNextRow()
void copyRow(int inputRowIndex)
inputRowIndex
- the input row position. If a selection vector
is attached, then this is the selection vector positionvoid copyAllRows()
isCopyPending()
will return true.boolean hasOutputRows()
boolean isOutputFull()
isCopyPending()
will
also return true.
This function also returns true if a schema change occurred on the latest input row, in which case the partially-completed batch of the old schema must be flushed downstream.
boolean isCopyPending()
VectorContainer harvest()
void close()
Copyright © 1970 The Apache Software Foundation. All rights reserved.