Class OffsetVectorWriterImpl
- All Implemented Interfaces:
ColumnWriter
,ScalarWriter
,ValueWriter
,OffsetVectorWriter
,WriterEvents
,WriterPosition
Note that the lastWriteIndex tracked here corresponds to the data values; it is one less than the actual offset vector last write index due to the nature of offset vector layouts. The selection of last write index basis makes roll-over processing easier as only this writer need know about the +1 translation required for writing.
The states illustrated in the base class apply here as well, remembering that the end offset for a row (or array position) is written one ahead of the vector index.
The vector index does create an interesting dynamic for the child writers. From the child writer's perspective, the states described in the super class are the only states of interest. Here we want to take the perspective of the parent.
The offset vector is an implementation of a repeat level. A repeat level can occur for a single array, or for a collection of columns within a repeated map. (A repeat level also occurs for variable-width fields, but this is a bit harder to see, so let's ignore that for now.)
The key point to realize is that each repeat level introduces an isolation level in terms of indexing. That is, empty values in the outer level have no affect on indexing in the inner level. In fact, the nature of a repeated outer level means that there are no empties in the inner level.
To illustrate:
Offset Vector Data Vector Indexes
lw, v > | 10 | - - - - - > | X | 10
| 12 | - - + | X | < lw' 11
| | + - - > | | < v' 12
In the above, the client has just written an array of two elements
at the current write position. The data starts at offset 10 in
the data vector, and the next write will be at 12. The end offset
is written one ahead of the vector index.
From the data vector's perspective, its last-write (lw') reflects the last element written. If this is an array of scalars, then the write index is automatically incremented, as illustrated by v'. (For map arrays, the index must be incremented by calling save() on the map array writer.)
Suppose the client now skips some arrays:
Offset Vector Data Vector
lw > | 10 | - - - - - > | X | 10
| 12 | - - + | X | < lw' 11
| | + - - > | | < v' 12
| | | | 13
v > | | | | 14
The last write position does not move and there are gaps in the
offset vector. The vector index points to the current row. Note
that the data vector last write and vector indexes do not change,
this reflects the fact that the the data vector's vector index
(v') matches the tail offset
The client now writes a three-element vector:
Offset Vector Data Vector
| 10 | - - - - - > | X | 10
| 12 | - - + | X | 11
| 12 | - - + - - > | Y | 12
| 12 | - - + | Y | 13
lw, v > | 12 | - - + | Y | < lw' 14
| 15 | - - - - - > | | < v' 15
Quite a bit just happened. The empty offset slots were back-filled
with the last write offset in the data vector. The client wrote
three values, which advanced the last write and vector indexes
in the data vector. And, the last write index in the offset
vector also moved to reflect the update of the offset vector.
Note that as a result, multiple positions in the offset vector
point to the same location in the data vector. This is fine; we
compute the number of entries as the difference between two successive
offset vector positions, so the empty positions have become 0-length
arrays.
Note that, for an array of scalars, when overflow occurs, we need only worry about two states in the data vector. Either data has been written for the row (as in the third example above), and so must be moved to the roll-over vector, or no data has been written and no move is needed. We never have to worry about missing values because the cannot occur in the data vector.
See ObjectArrayWriter
for information about arrays of
maps (arrays of multiple columns.)
Empty Slots
The offset vector writer handles empty slots in two distinct ways. First, the writer handles its own empties. Suppose that this is the offset vector for a VarChar column. Suppose we write "Foo" in the first slot. Now we have an offset vector with the values [ 0 3 ]. Suppose the client skips several rows and next writes at slot 5. We must copy the latest offset (3) into all the skipped slots: [ 0 3 3 3 3 3 ]. The result is a set of four empty VarChars in positions 1, 2, 3 and 4. (Here, remember that the offset vector always has one more value than the the number of rows.)
The second way to fill empties is in the data vector. The data vector may choose
to fill the four "empty" slots with a value, say "X". In this case, it is up to
the data vector to fill in the values, calling into this vector to set each
offset. Note that when doing this, the calls are a bit different than for writing
a regular value because we want to write at the "last write position", not the
current row position. See BaseVarWidthWriter
for an example.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
AbstractFixedWidthWriter.BaseFixedWidthWriter, AbstractFixedWidthWriter.BaseIntWriter
Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
AbstractScalarWriterImpl.ScalarObjectWriter
Nested classes/interfaces inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
WriterEvents.ColumnWriterListener, WriterEvents.State
-
Field Summary
Modifier and TypeFieldDescriptionprotected int
Cached value of the end offset for the current value.Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
lastWriteIndex
Fields inherited from class org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter
capacity, drillBuf, emptyValue, listener, MIN_BUFFER_SIZE
Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
schema, vectorIndex
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
copy
(ColumnReader from) Copy a single value from the given reader, which must be of the same type as this writer.void
dump
(HierarchicalFormatter format) protected final void
fillEmpties
(int fillCount) final void
fillOffset
(int newOffset) int
void
The vectors backing this writer rolled over.final int
protected final int
Return the write offset, which is one greater than the index reported by the vector index.void
The vectors backing this vector are about to roll over.protected void
realloc
(int size) void
During a writer to a row, rewind the the current index position to restart the row.final void
reviseOffset
(int newOffset) int
void
setDefaultValue
(Object value) Set the default value to be used to fill empties for this writer.final void
setNextOffset
(int newOffset) final void
Write value to a vector as a Java object of the "native" type for the column.void
setValueCount
(int valueCount) void
void
startRow()
Start a new row.void
Start a write (batch) operation.Describe the type of the value.vector()
int
width()
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractFixedWidthWriter
endWrite, lastWriteIndex, mandatoryResize, resize, setBuffer, setLastWriteIndex
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter
appendBytes, bindListener, bindSchema, canExpand, nullable, overflowed, setBoolean, setBytes, setDate, setDecimal, setDouble, setFloat, setInt, setLong, setNull, setPeriod, setString, setTime, setTimestamp
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
bindIndex, endArrayValue, isProjected, rowStartIndex, saveRow, schema, type, writeIndex
Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriter
conversionError, extendedType, setObject, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.drill.exec.vector.accessor.ColumnWriter
isProjected, nullable, schema, setNull, setObject, type
Methods inherited from interface org.apache.drill.exec.vector.accessor.ScalarWriter
extendedType
Methods inherited from interface org.apache.drill.exec.vector.accessor.ValueWriter
appendBytes, setBoolean, setBytes, setDate, setDecimal, setDouble, setFloat, setInt, setLong, setNull, setPeriod, setString, setTime, setTimestamp
Methods inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
bindIndex, bindListener, endArrayValue, endWrite, saveRow
Methods inherited from interface org.apache.drill.exec.vector.accessor.WriterPosition
lastWriteIndex, rowStartIndex, writeIndex
-
Field Details
-
nextOffset
protected int nextOffsetCached value of the end offset for the current value. Used primarily for variable-width columns to allow the column to be rewritten multiple times within the same row. The start offset value is updated with the end offset only when the value is committed in {@link @endValue()}.
-
-
Constructor Details
-
OffsetVectorWriterImpl
-
-
Method Details
-
vector
- Specified by:
vector
in classAbstractScalarWriterImpl
-
width
public int width()- Specified by:
width
in classAbstractFixedWidthWriter
-
realloc
protected void realloc(int size) - Overrides:
realloc
in classBaseScalarWriter
-
valueType
Description copied from interface:ScalarWriter
Describe the type of the value. This is a compression of the value vector type: it describes which method will return the vector value.- Specified by:
valueType
in interfaceScalarWriter
- Returns:
- the value type which indicates which get method is valid for the column
-
startWrite
public void startWrite()Description copied from interface:WriterEvents
Start a write (batch) operation. Performs any vector initialization required at the start of a batch (especially for offset vectors.)- Specified by:
startWrite
in interfaceWriterEvents
- Overrides:
startWrite
in classAbstractFixedWidthWriter
-
nextOffset
public int nextOffset()- Specified by:
nextOffset
in interfaceOffsetVectorWriter
-
rowStartOffset
public int rowStartOffset()- Specified by:
rowStartOffset
in interfaceOffsetVectorWriter
-
startRow
public void startRow()Description copied from interface:WriterEvents
Start a new row. To be called only when a row is not active. To restart a row, callWriterEvents.restartRow()
instead.- Specified by:
startRow
in interfaceWriterEvents
- Overrides:
startRow
in classAbstractScalarWriterImpl
-
prepareWrite
protected final int prepareWrite()Return the write offset, which is one greater than the index reported by the vector index.- Returns:
- the offset in which to write the current offset of the end of the current data value
-
prepareFill
public final int prepareFill() -
fillEmpties
protected final void fillEmpties(int fillCount) - Specified by:
fillEmpties
in classAbstractFixedWidthWriter
-
setNextOffset
public final void setNextOffset(int newOffset) - Specified by:
setNextOffset
in interfaceOffsetVectorWriter
-
reviseOffset
public final void reviseOffset(int newOffset) -
fillOffset
public final void fillOffset(int newOffset) -
setValue
Description copied from interface:ValueWriter
Write value to a vector as a Java object of the "native" type for the column. This form is available only on scalar writers. The object must be of the form for the primary write method above.Primarily to be used when the code already knows the object type.
- Specified by:
setValue
in interfaceValueWriter
- Parameters:
value
- a value that matches the primary setter above, or null to set the column to null- See Also:
-
skipNulls
public void skipNulls()- Overrides:
skipNulls
in classAbstractFixedWidthWriter
-
restartRow
public void restartRow()Description copied from interface:WriterEvents
During a writer to a row, rewind the the current index position to restart the row. Done when abandoning the current row, such as when filtering out a row at read time.- Specified by:
restartRow
in interfaceWriterEvents
- Overrides:
restartRow
in classAbstractFixedWidthWriter
-
preRollover
public void preRollover()Description copied from interface:WriterEvents
The vectors backing this vector are about to roll over. Finish the current batch up to, but not including, the current row.- Specified by:
preRollover
in interfaceWriterEvents
- Overrides:
preRollover
in classAbstractFixedWidthWriter
-
postRollover
public void postRollover()Description copied from interface:WriterEvents
The vectors backing this writer rolled over. This means that data for the current row has been rolled over into a new vector. Offsets and indexes should be shifted based on the understanding that data for the current row now resides at the start of a new vector instead of its previous location elsewhere in an old vector.- Specified by:
postRollover
in interfaceWriterEvents
- Overrides:
postRollover
in classAbstractFixedWidthWriter
-
setValueCount
public void setValueCount(int valueCount) - Overrides:
setValueCount
in classAbstractFixedWidthWriter
-
dump
- Specified by:
dump
in interfaceOffsetVectorWriter
- Specified by:
dump
in interfaceWriterEvents
- Overrides:
dump
in classAbstractFixedWidthWriter
-
setDefaultValue
Description copied from interface:ScalarWriter
Set the default value to be used to fill empties for this writer. Only valid for required writers: null writers set this is-set bit to 0 and set the data value to 0.- Specified by:
setDefaultValue
in interfaceScalarWriter
- Parameters:
value
- the value to set. Cannot be null. The type of the value must match that legal forValueWriter.setValue(Object)
-
copy
Description copied from interface:ColumnWriter
Copy a single value from the given reader, which must be of the same type as this writer.- Specified by:
copy
in interfaceColumnWriter
- Parameters:
from
- reader to provide the data
-