The solution is to build the readers in two passes. The first builds a metadata model for each batch and merges those models. (This version requires strict identity in schemas; a fancier solution could handle, say, the addition of map members in one batch vs. another or the addition of union/list members across batches.)
The metadata (by design) has the information we need, so in the second pass we walk the metadata hierarchy and build up readers from that, creating vector accessors as we go to provide a runtime path from the root vectors (selected by the SV4) to the inner vectors (which are not represented as hypervectors.)
The hypervector wrapper mechanism provides a crude way to handle inner vectors, but it is awkward, and does not lend itself to the kind of caching we'd like for performance, so we use our own accessors for inner vectors. The outermost hyper vector accessors wrap a hyper vector wrapper. Inner accessors directly navigate at the vector level (from a vector provided by the outer vector accessor.)
Nested Class SummaryModifier and TypeClassDescription
static classVector accessor used by the column accessors to obtain the vector for each column value.
Method SummaryModifier and TypeMethodDescription
static RowSetReaderImplBuild a hyper-batch reader given a batch accessor.
Methods inherited from class org.apache.drill.exec.physical.resultSet.model.ReaderBuilder
buildBuild a hyper-batch reader given a batch accessor.
batch- wrapper which provides the container and SV4
- a row set reader for the hyper-batch
SchemaChangeException- if the individual batches have inconsistent schemas (say, a column in batch 1 is an INT, but in batch 2 it is a VARCHAR)
buildContainerChildrenprotected List<AbstractObjectReader> buildContainerChildren
(VectorContainer container) throws SchemaChangeException