Class OffsetVectorWriterImpl

All Implemented Interfaces:
ColumnWriter, ScalarWriter, ValueWriter, OffsetVectorWriter, WriterEvents, WriterPosition

public class OffsetVectorWriterImpl extends AbstractFixedWidthWriter implements OffsetVectorWriter
Specialized column writer for the (hidden) offset vector used with variable-length or repeated vectors. See comments in the template file for more details.

Note that the lastWriteIndex tracked here corresponds to the data values; it is one less than the actual offset vector last write index due to the nature of offset vector layouts. The selection of last write index basis makes roll-over processing easier as only this writer need know about the +1 translation required for writing.

The states illustrated in the base class apply here as well, remembering that the end offset for a row (or array position) is written one ahead of the vector index.

The vector index does create an interesting dynamic for the child writers. From the child writer's perspective, the states described in the super class are the only states of interest. Here we want to take the perspective of the parent.

The offset vector is an implementation of a repeat level. A repeat level can occur for a single array, or for a collection of columns within a repeated map. (A repeat level also occurs for variable-width fields, but this is a bit harder to see, so let's ignore that for now.)

The key point to realize is that each repeat level introduces an isolation level in terms of indexing. That is, empty values in the outer level have no affect on indexing in the inner level. In fact, the nature of a repeated outer level means that there are no empties in the inner level.

To illustrate:

       Offset Vector          Data Vector   Indexes
  lw, v > | 10 |   - - - - - >   | X |        10
          | 12 |   - - +         | X | < lw'  11
          |    |       + - - >   |   | < v'   12
In the above, the client has just written an array of two elements at the current write position. The data starts at offset 10 in the data vector, and the next write will be at 12. The end offset is written one ahead of the vector index.

From the data vector's perspective, its last-write (lw') reflects the last element written. If this is an array of scalars, then the write index is automatically incremented, as illustrated by v'. (For map arrays, the index must be incremented by calling save() on the map array writer.)

Suppose the client now skips some arrays:

       Offset Vector          Data Vector
     lw > | 10 |   - - - - - >   | X |        10
          | 12 |   - - +         | X | < lw'  11
          |    |       + - - >   |   | < v'   12
          |    |                 |   |        13
      v > |    |                 |   |        14
The last write position does not move and there are gaps in the offset vector. The vector index points to the current row. Note that the data vector last write and vector indexes do not change, this reflects the fact that the the data vector's vector index (v') matches the tail offset

The client now writes a three-element vector:

       Offset Vector          Data Vector
          | 10 |   - - - - - >   | X |        10
          | 12 |   - - +         | X |        11
          | 12 |   - - + - - >   | Y |        12
          | 12 |   - - +         | Y |        13
  lw, v > | 12 |   - - +         | Y | < lw'  14
          | 15 |   - - - - - >   |   | < v'   15
Quite a bit just happened. The empty offset slots were back-filled with the last write offset in the data vector. The client wrote three values, which advanced the last write and vector indexes in the data vector. And, the last write index in the offset vector also moved to reflect the update of the offset vector. Note that as a result, multiple positions in the offset vector point to the same location in the data vector. This is fine; we compute the number of entries as the difference between two successive offset vector positions, so the empty positions have become 0-length arrays.

Note that, for an array of scalars, when overflow occurs, we need only worry about two states in the data vector. Either data has been written for the row (as in the third example above), and so must be moved to the roll-over vector, or no data has been written and no move is needed. We never have to worry about missing values because the cannot occur in the data vector.

See ObjectArrayWriter for information about arrays of maps (arrays of multiple columns.)

Empty Slots

The offset vector writer handles empty slots in two distinct ways. First, the writer handles its own empties. Suppose that this is the offset vector for a VarChar column. Suppose we write "Foo" in the first slot. Now we have an offset vector with the values [ 0 3 ]. Suppose the client skips several rows and next writes at slot 5. We must copy the latest offset (3) into all the skipped slots: [ 0 3 3 3 3 3 ]. The result is a set of four empty VarChars in positions 1, 2, 3 and 4. (Here, remember that the offset vector always has one more value than the the number of rows.)

The second way to fill empties is in the data vector. The data vector may choose to fill the four "empty" slots with a value, say "X". In this case, it is up to the data vector to fill in the values, calling into this vector to set each offset. Note that when doing this, the calls are a bit different than for writing a regular value because we want to write at the "last write position", not the current row position. See BaseVarWidthWriter for an example.

  • Field Details

    • nextOffset

      protected int nextOffset
      Cached value of the end offset for the current value. Used primarily for variable-width columns to allow the column to be rewritten multiple times within the same row. The start offset value is updated with the end offset only when the value is committed in {@link @endValue()}.
  • Constructor Details

    • OffsetVectorWriterImpl

      public OffsetVectorWriterImpl(UInt4Vector vector)
  • Method Details

    • vector

      public BaseDataValueVector vector()
      Specified by:
      vector in class AbstractScalarWriterImpl
    • width

      public int width()
      Specified by:
      width in class AbstractFixedWidthWriter
    • realloc

      protected void realloc(int size)
      realloc in class BaseScalarWriter
    • valueType

      public ValueType valueType()
      Description copied from interface: ScalarWriter
      Describe the type of the value. This is a compression of the value vector type: it describes which method will return the vector value.
      Specified by:
      valueType in interface ScalarWriter
      the value type which indicates which get method is valid for the column
    • startWrite

      public void startWrite()
      Description copied from interface: WriterEvents
      Start a write (batch) operation. Performs any vector initialization required at the start of a batch (especially for offset vectors.)
      Specified by:
      startWrite in interface WriterEvents
      startWrite in class AbstractFixedWidthWriter
    • nextOffset

      public int nextOffset()
      Specified by:
      nextOffset in interface OffsetVectorWriter
    • rowStartOffset

      public int rowStartOffset()
      Specified by:
      rowStartOffset in interface OffsetVectorWriter
    • startRow

      public void startRow()
      Description copied from interface: WriterEvents
      Start a new row. To be called only when a row is not active. To restart a row, call WriterEvents.restartRow() instead.
      Specified by:
      startRow in interface WriterEvents
      startRow in class AbstractScalarWriterImpl
    • prepareWrite

      protected final int prepareWrite()
      Return the write offset, which is one greater than the index reported by the vector index.
      the offset in which to write the current offset of the end of the current data value
    • prepareFill

      public final int prepareFill()
    • fillEmpties

      protected final void fillEmpties(int fillCount)
      Specified by:
      fillEmpties in class AbstractFixedWidthWriter
    • setNextOffset

      public final void setNextOffset(int newOffset)
      Specified by:
      setNextOffset in interface OffsetVectorWriter
    • reviseOffset

      public final void reviseOffset(int newOffset)
    • fillOffset

      public final void fillOffset(int newOffset)
    • setValue

      public final void setValue(Object value)
      Description copied from interface: ValueWriter
      Write value to a vector as a Java object of the "native" type for the column. This form is available only on scalar writers. The object must be of the form for the primary write method above.

      Primarily to be used when the code already knows the object type.

      Specified by:
      setValue in interface ValueWriter
      value - a value that matches the primary setter above, or null to set the column to null
      See Also:
    • skipNulls

      public void skipNulls()
      skipNulls in class AbstractFixedWidthWriter
    • restartRow

      public void restartRow()
      Description copied from interface: WriterEvents
      During a writer to a row, rewind the the current index position to restart the row. Done when abandoning the current row, such as when filtering out a row at read time.
      Specified by:
      restartRow in interface WriterEvents
      restartRow in class AbstractFixedWidthWriter
    • preRollover

      public void preRollover()
      Description copied from interface: WriterEvents
      The vectors backing this vector are about to roll over. Finish the current batch up to, but not including, the current row.
      Specified by:
      preRollover in interface WriterEvents
      preRollover in class AbstractFixedWidthWriter
    • postRollover

      public void postRollover()
      Description copied from interface: WriterEvents
      The vectors backing this writer rolled over. This means that data for the current row has been rolled over into a new vector. Offsets and indexes should be shifted based on the understanding that data for the current row now resides at the start of a new vector instead of its previous location elsewhere in an old vector.
      Specified by:
      postRollover in interface WriterEvents
      postRollover in class AbstractFixedWidthWriter
    • setValueCount

      public void setValueCount(int valueCount)
      setValueCount in class AbstractFixedWidthWriter
    • dump

      public void dump(HierarchicalFormatter format)
      Specified by:
      dump in interface OffsetVectorWriter
      Specified by:
      dump in interface WriterEvents
      dump in class AbstractFixedWidthWriter
    • setDefaultValue

      public void setDefaultValue(Object value)
      Description copied from interface: ScalarWriter
      Set the default value to be used to fill empties for this writer. Only valid for required writers: null writers set this is-set bit to 0 and set the data value to 0.
      Specified by:
      setDefaultValue in interface ScalarWriter
      value - the value to set. Cannot be null. The type of the value must match that legal for ValueWriter.setValue(Object)
    • copy

      public void copy(ColumnReader from)
      Description copied from interface: ColumnWriter
      Copy a single value from the given reader, which must be of the same type as this writer.
      Specified by:
      copy in interface ColumnWriter
      from - reader to provide the data