Class BatchValidator


public class BatchValidator extends Object
Validate a batch of value vectors. It is not possible to validate the data, but we can validate the structure, especially offset vectors. Only handles single (non-hyper) vectors at present. Current form is self-contained. Better checks can be done by moving checks inside vectors or by exposing more metadata from vectors.

Drill is not clear on how to handle a batch of zero records. Offset vectors normally have one more entry than the record count. If a batch has 1 record, the offset vector has 2 entries. The entry at 0 is always 0, the entry at 1 marks the end of the 0th record.

But, this gets a bit murky. If a batch has one record, and contains a repeated map, and the map has no entries, then the nested offset vector usually has 0 entries, not 1.

Generalizing, sometimes when a batch has zero records, the "top-level" offset vectors have 1 items, sometimes zero items.

The simplest solution would be to simply enforce here that all offset vectors must have n+1 entries, where n is the row count (top-level vectors) or item count (nested vectors.)

But, after fighting with the code, this seems an unobtainable goal. For one thing, deserialization seems to rely on nested offset vectors having zero entries when the value count is zero.

Instead, this code assumes that any offset vector, top-level or nested, will have zero entries if the value count is zero. That is an offset vector has either zero entries or n+1 entries.