org.apache.drill.exec.vector.accessor.writer.BaseScalarWriter

All Implemented Interfaces:: ColumnWriter, ScalarWriter, ValueWriter, WriterEvents, WriterPosition

Direct Known Subclasses:: AbstractFixedWidthWriter, BaseVarWidthWriter

public abstract class BaseScalarWriter extends AbstractScalarWriterImpl

Column writer implementation that acts as the basis for the generated, vector-specific implementations. All set methods throw an exception; subclasses simply override the supported method(s).

The only tricky part to this class is understanding the state of the write indexes as the write proceeds. There are two pointers to consider:

lastWriteIndex: The position in the vector at which the client last asked us to write data. This index is maintained in this class because it depends only on the actions of this class.
vectorIndex: The position in the vector at which we will write if the client chooses to write a value at this time. The vector index is shared by all columns at the same repeat level. It is incremented as the client steps through the write and is observed in this class each time a write occurs.

A repeat level is defined as any of the following:

The set of top-level scalar columns, or those within a top-level, non-repeated map, or nested to any depth within non-repeated maps rooted at the top level.
The values for a single scalar array.
The set of scalar columns within a repeated map, or nested within non-repeated maps within a repeated map.

Items at a repeat level index together and share a vector index. However, the columns within a repeat level do not share a last write index: some can lag further behind than others.

Let's illustrate the states. Let's focus on one column and illustrate the three states that can occur during write:

Behind: the last write index is more than one position behind the vector index. Zero-filling will be needed to catch up to the vector index.
Written: the last write index is the same as the vector index because the client wrote data at this position (and previous values were back-filled with nulls, empties or zeros.)
Unwritten: the last write index is one behind the vector index. This occurs when the column was written, then the client moved to the next row or array position.
Restarted: The current row is abandoned (perhaps filtered out) and is to be rewritten. The last write position moves back one position. Note that, the Restarted state is indistinguishable from the unwritten state: the only real difference is that the current slot (pointed to by the vector index) contains the previous written value that must be overwritten or back-filled. But, this is fine, because we assume that unwritten values are garbage anyway.

To illustrate:


      Behind      Written    Unwritten    Restarted
       |X|          |X|         |X|          |X|
   lw >|X|          |X|         |X|          |X|
       | |          |0|         |0|     lw > |0|
    v >| |  lw, v > |X|    lw > |X|      v > |X|
                            v > | |

The illustrated state transitions are:

Suppose the state starts in Behind.
- If the client writes a value, then the empty slot is back-filled and the state moves to Written.
- If the client does not write a value, the state stays at Behind, and the gap of unfilled values grows.
When in the Written state:
- If the client saves the current row or array position, the vector index increments and we move to the Unwritten state.
- If the client abandons the row, the last write position moves back one to recreate the unwritten state. We've shown this state separately above just to illustrate the two transitions from Written.
When in the Unwritten (or Restarted) states:
- If the client writes a value, then the writer moves back to the Written state.
- If the client skips the value, then the vector index increments again, leaving a gap, and the writer moves to the Behind state.

We've already noted that the Restarted state is identical to the Unwritten state (and was discussed just to make the flow a bit clearer.) The astute reader will have noticed that the Behind state is the same as the Unwritten state if we define the combined state as when the last write position is behind the vector index.

Further, if one simply treats the gap between last write and the vector indexes as the amount (which may be zero) to back-fill, then there is just one state. This is, in fact, how the code works: it always writes to the vector index (and can do so multiple times for a single row), back-filling as necessary.

The states, then, are more for our use in understanding the algorithm. They are also very useful when working through the logic of performing a roll-over when a vector overflows.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
AbstractScalarWriterImpl.ScalarObjectWriter

Nested classes/interfaces inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
WriterEvents.ColumnWriterListener, WriterEvents.State
Field Summary

Fields

Modifier and Type

Field

Description

protected int

capacity

Capacity, in values, of the currently allocated buffer that backs the vector.

protected DrillBuf

drillBuf

protected byte[]

emptyValue

Value to use to fill empties.

protected WriterEvents.ColumnWriterListener

listener

Listener invoked if the vector overflows.

static final int

MIN_BUFFER_SIZE

Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
schema, vectorIndex
Constructor Summary

Constructors

Constructor

Description

BaseScalarWriter()
Method Summary

Modifier and Type

Method

Description

void

appendBytes(byte[] value, int len)

void

bindListener(WriterEvents.ColumnWriterListener listener)

Bind a listener to the underlying vector writer.

void

bindSchema(ColumnMetadata schema)

protected boolean

canExpand(int delta)

The vector is about to grow.

void

dump(HierarchicalFormatter format)

boolean

nullable()

Whether this writer allows nulls.

protected void

overflowed()

Handle vector overflow.

protected void

realloc(int size)

void

setBoolean(boolean value)

protected abstract void

setBuffer()

All change of buffer comes through this function to allow capturing the buffer address and capacity.

void

setBytes(byte[] value, int len)

void

setDate(LocalDate value)

void

setDecimal(BigDecimal value)

void

setDouble(double value)

void

setFloat(float value)

void

setInt(int value)

void

setLong(long value)

void

setNull()

Set the current value to null.

void

setPeriod(org.joda.time.Period value)

void

setString(String value)

void

setTime(LocalTime value)

void

setTimestamp(Instant value)

abstract void

skipNulls()

Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl
bindIndex, endArrayValue, isProjected, rowStartIndex, saveRow, schema, startRow, startWrite, type, vector, writeIndex

Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriter
conversionError, extendedType, setObject, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.drill.exec.vector.accessor.ColumnWriter
copy

Methods inherited from interface org.apache.drill.exec.vector.accessor.ScalarWriter
setDefaultValue, valueType

Methods inherited from interface org.apache.drill.exec.vector.accessor.ValueWriter
setValue

Methods inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents
endWrite, postRollover, preRollover, restartRow

Methods inherited from interface org.apache.drill.exec.vector.accessor.WriterPosition
lastWriteIndex

Field Details
- MIN_BUFFER_SIZE
  
  public static final int MIN_BUFFER_SIZE
  See Also:
  
  Constant Field Values
- listener
  
  protected WriterEvents.ColumnWriterListener listener
  
  Listener invoked if the vector overflows. If not provided, then the writer does not support vector overflow.
- emptyValue
  
  protected byte[] emptyValue
  
  Value to use to fill empties. Must be at least as wide as each value.
- drillBuf
  
  protected DrillBuf drillBuf
- capacity
  
  protected int capacity
  
  Capacity, in values, of the currently allocated buffer that backs the vector. Updated each time the buffer changes. The capacity is in values (rather than bytes) to streamline the per-write logic.
Constructor Details
- BaseScalarWriter
  
  public BaseScalarWriter()
Method Details
- bindListener
  
  public void bindListener(WriterEvents.ColumnWriterListener listener)
  
  Description copied from interface: WriterEvents
  
  Bind a listener to the underlying vector writer. This listener reports on vector events (overflow, growth), and so is called only when the writer is backed by a vector. The listener is ignored (and never called) for dummy (non-projected) columns. If the column is compound (such as for a nullable or repeated column, or for a map), then the writer is bound to the individual components.
  
  Specified by:
  
  bindListener in interface WriterEvents
  
  Overrides:
  
  bindListener in class AbstractScalarWriter
  
  Parameters:
  
  listener - the vector event listener to bind
- bindSchema
  
  public void bindSchema(ColumnMetadata schema)
  
  Overrides:
  
  bindSchema in class AbstractScalarWriterImpl
- setBuffer
  
  protected abstract void setBuffer()
  
  All change of buffer comes through this function to allow capturing the buffer address and capacity. Only two ways to set the buffer: by binding to a vector in bindVector(), or by resizing the vector in prepareWrite().
- realloc
  
  protected void realloc(int size)
- canExpand
  
  protected boolean canExpand(int delta)
  
  The vector is about to grow. Give the listener a chance to veto the growth and opt for overflow instead.
  
  Parameters:
  
  delta - the new amount of memory to allocate
  
  Returns:
  
  true if the vector can be grown, false if an overflow should be triggered
- overflowed
  
  protected void overflowed()
  
  Handle vector overflow. If this is an array, then there is a slim chance we may need to grow the vector immediately after overflow. Since a double overflow is not allowed, this recursive call won't continue forever.
- skipNulls
  
  public abstract void skipNulls()
- nullable
  
  public boolean nullable()
  
  Description copied from interface: ColumnWriter
  
  Whether this writer allows nulls. This is not as simple as checking for the TypeProtos.DataMode.OPTIONAL type in the schema. List entries are nullable, if they are primitive, but not if they are maps or lists. Unions are nullable, regardless of cardinality.
  
  Returns:
  
  true if a call to ColumnWriter.setNull() is supported, false if not
- setNull
  
  public void setNull()
  
  Description copied from interface: ColumnWriter
  
  Set the current value to null. Support depends on the underlying implementation: only nullable types support this operation. throws IllegalStateException if called on a non-nullable value.
- setBoolean
  
  public void setBoolean(boolean value)
- setInt
  
  public void setInt(int value)
- setLong
  
  public void setLong(long value)
- setFloat
  
  public void setFloat(float value)
- setDouble
  
  public void setDouble(double value)
- setString
  
  public void setString(String value)
- setBytes
  
  public void setBytes(byte[] value, int len)
- appendBytes
  
  public void appendBytes(byte[] value, int len)
- setDecimal
  
  public void setDecimal(BigDecimal value)
- setPeriod
  
  public void setPeriod(org.joda.time.Period value)
- setDate
  
  public void setDate(LocalDate value)
- setTime
  
  public void setTime(LocalTime value)
- setTimestamp
  
  public void setTimestamp(Instant value)
- dump
  
  public void dump(HierarchicalFormatter format)
  
  Specified by:
  
  dump in interface WriterEvents
  
  Overrides:
  
  dump in class AbstractScalarWriterImpl

Class BaseScalarWriter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl

Nested classes/interfaces inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents

Field Summary

Fields inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriterImpl

Methods inherited from class org.apache.drill.exec.vector.accessor.writer.AbstractScalarWriter

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.drill.exec.vector.accessor.ColumnWriter

Methods inherited from interface org.apache.drill.exec.vector.accessor.ScalarWriter

Methods inherited from interface org.apache.drill.exec.vector.accessor.ValueWriter

Methods inherited from interface org.apache.drill.exec.vector.accessor.writer.WriterEvents

Methods inherited from interface org.apache.drill.exec.vector.accessor.WriterPosition

Field Details

MIN_BUFFER_SIZE

listener

emptyValue

drillBuf

capacity

Constructor Details

BaseScalarWriter

Method Details

bindListener

bindSchema

setBuffer

realloc

canExpand

overflowed

skipNulls

nullable

setNull

setBoolean

setInt

setLong

setFloat

setDouble

setString

setBytes

appendBytes

setDecimal

setPeriod

setDate

setTime

setTimestamp

dump