See: Description
Interface | Description |
---|---|
PullResultSetReader |
Iterates over the set of batches in a result set, providing
a row set reader to iterate over the rows within each batch.
|
PushResultSetReader |
Push-based result set reader, in which the caller obtains batches
and registers them with the implementation.
|
ResultSetCopier |
Copies rows from an input batch to an output batch.
|
ResultSetLoader |
Builds a result set (series of zero or more row sets) based on a defined
schema which may
evolve (expand) over time.
|
ResultVectorCache |
Interface for a cache that implements "vector persistence" across
multiple result set loaders.
|
RowSetLoader |
Interface for writing values to a row set.
|
Readers that discover schema can build the schema incrementally: add a column, load data for that column for one row, discover the next column, and so on. Almost any kind of column can be added at any time within the first batch:
TupleBuilder
class
to build the schema. The schema class is part of the
RowSetLoader
object available from the
ResultSetLoader.writer()
method.
If the input is a flat structure, then the physical schema has a flattened schema as the degenerate case.
In both cases, access to columns is by index or by name. If new columns are added while loading, their index is always at the end of the existing columns.
ResultSetLoader.startBatch()
and a call to VectorState.harvestWithLookAhead()
to obtain the completed batch. Note that readers do not
call these methods; the scan operator does this work.
Each row is delimited by a call to startValue()
and a call to
saveRow()
. startRow() performs initialization necessary
for some vectors such as repeated vectors. saveRow() moves the
row pointer ahead.
A reader can easily reject a row by calling startRow(), begin to load a row, but omitting the call to saveRow() In this case, the next call to startRow() repositions the row pointer to the same row, and new data will overwrite the previous data, effectively erasing the unwanted row. This also works for the last row; omitting the call to saveRow() causes the batch to hold only the rows actually saved.
Readers then write to each column. Columns are accessible via index
(TupleWriter.column(int)
or by name
(TupleWriter.column(String)
.
Indexed access is much faster.
Column indexes are defined by the order that columns are added. The first
column is column 0, the second is column 1 and so on.
Each call to the above methods returns the same column writer, allowing the reader to cache column writers for additional performance.
All column writers are of the same class; there is no need to cast to a type corresponding to the vector. Instead, they provide a variety of setType methods, where the type is one of various Java primitive or structured types. Most vectors provide just one method, but others (such as VarChar) provide two. The implementation will throw an exception if the vector does not support a particular type.
Note that this class uses the term "loader" for row and column writers since the term "writer" is already used by the legacy record set mutator and column writers.
RowSetLoader.isFull()
method.
After each call to saveRow()
,
the client should call isFull() to determine if the client can add another row. Note
that failing to do this check will cause the next call to
ResultSetLoader.startBatch()
to throw an exception.
The limits have subtle differences, however. Row limits are simple: at the end of the last row, the mutator notices that no more rows are possible, and so does not allow starting a new row.
Vector overflow is more complex. A row may consist of columns (a, b, c). The client may write column a, but then column b might trigger a vector overflow. (For example, b is a Varchar, and the value for b is larger than the space left in the vector.) The client cannot stop and rewrite a. Instead, the client simply continues writing the row. The mutator, internally, moves this "overflow" row to a new batch. The overflow row becomes the first row of the next batch rather than the first row of the current batch.
For this reason, the client can treat the two overflow cases identically, as described above.
There are some subtle differences between the two cases that clients may occasionally may need to expect:
Copyright © 1970 The Apache Software Foundation. All rights reserved.