Interface ValueVector

All Superinterfaces:
AutoCloseable, Closeable, Iterable<ValueVector>
All Known Subinterfaces:
FixedWidthVector, NullableVector, RepeatedValueVector, VariableWidthVector
All Known Implementing Classes:
AbstractContainerVector, AbstractMapVector, AbstractRepeatedMapVector, BaseDataValueVector, BaseRepeatedValueVector, BaseValueVector, BigIntVector, BitVector, DateVector, Decimal18Vector, Decimal28DenseVector, Decimal28SparseVector, Decimal38DenseVector, Decimal38SparseVector, Decimal9Vector, DictVector, Float4Vector, Float8Vector, IntervalDayVector, IntervalVector, IntervalYearVector, IntVector, ListVector, MapVector, NullableBigIntVector, NullableBitVector, NullableDateVector, NullableDecimal18Vector, NullableDecimal28DenseVector, NullableDecimal28SparseVector, NullableDecimal38DenseVector, NullableDecimal38SparseVector, NullableDecimal9Vector, NullableFloat4Vector, NullableFloat8Vector, NullableIntervalDayVector, NullableIntervalVector, NullableIntervalYearVector, NullableIntVector, NullableSmallIntVector, NullableTimeStampVector, NullableTimeVector, NullableTinyIntVector, NullableUInt1Vector, NullableUInt2Vector, NullableUInt4Vector, NullableUInt8Vector, NullableVar16CharVector, NullableVarBinaryVector, NullableVarCharVector, NullableVarDecimalVector, ObjectVector, RepeatedBigIntVector, RepeatedBitVector, RepeatedDateVector, RepeatedDecimal18Vector, RepeatedDecimal28DenseVector, RepeatedDecimal28SparseVector, RepeatedDecimal38DenseVector, RepeatedDecimal38SparseVector, RepeatedDecimal9Vector, RepeatedDictVector, RepeatedFloat4Vector, RepeatedFloat8Vector, RepeatedIntervalDayVector, RepeatedIntervalVector, RepeatedIntervalYearVector, RepeatedIntVector, RepeatedListVector, RepeatedListVector.DelegateRepeatedVector, RepeatedMapVector, RepeatedSmallIntVector, RepeatedTimeStampVector, RepeatedTimeVector, RepeatedTinyIntVector, RepeatedUInt1Vector, RepeatedUInt2Vector, RepeatedUInt4Vector, RepeatedUInt8Vector, RepeatedVar16CharVector, RepeatedVarBinaryVector, RepeatedVarCharVector, RepeatedVarDecimalVector, SmallIntVector, TimeStampVector, TimeVector, TinyIntVector, UInt1Vector, UInt2Vector, UInt4Vector, UInt8Vector, UnionVector, UntypedNullVector, Var16CharVector, VarBinaryVector, VarCharVector, VarDecimalVector, VectorAccessibleComplexWriter, ZeroVector

public interface ValueVector extends Closeable, Iterable<ValueVector>
An abstraction that is used to store a sequence of values in an individual column. A value vector stores underlying data in-memory in a columnar fashion that is compact and efficient. The column whose data is stored, is referred by getField(). A vector when instantiated, relies on a dead buffer. It is important that vector is allocated before attempting to read or write. There are a few "rules" around vectors:
  • Values need to be written in order (e.g. index 0, 1, 2, 5).
  • Null vectors start with all values as null before writing anything.
  • For variable width types, the offset vector should be all zeros before writing.
  • You must call setValueCount before a vector can be read.
  • You should never write to a vector once it has been read.
  • Vectors may not grow larger than the number of bytes specified in MAX_BUFFER_SIZE to prevent memory fragmentation. Use the setBounded() methods in the mutator to enforce this rule.
Please note that the current implementation doesn't enforce those rules, hence we may find few places that deviate from these rules (e.g. offset vectors in Variable Length and Repeated vector) This interface "should" strive to guarantee this order of operation:
allocate > mutate > setvaluecount > access > clear (or allocate to start the process over).
  • Field Details

    • MAX_BUFFER_SIZE

      static final int MAX_BUFFER_SIZE
      Maximum allowed size of the buffer backing a value vector. Set to the Netty chunk size to prevent memory fragmentation.
    • MAX_ROW_COUNT

      static final int MAX_ROW_COUNT
      Maximum allowed row count in a vector. Repeated vectors may have more items, but can have no more than this number or arrays. Limited by 2-byte length in SV2: 65536 = 216.
      See Also:
    • MIN_ROW_COUNT

      static final int MIN_ROW_COUNT
      See Also:
    • BITS_VECTOR_NAME

      static final String BITS_VECTOR_NAME
      See Also:
    • OFFSETS_VECTOR_NAME

      static final String OFFSETS_VECTOR_NAME
      See Also:
    • VALUES_VECTOR_NAME

      @Deprecated static final String VALUES_VECTOR_NAME
      Deprecated.
      See Also:
  • Method Details

    • allocateNew

      void allocateNew() throws OutOfMemoryException
      Allocate new buffers. ValueVector implements logic to determine how much to allocate.
      Throws:
      OutOfMemoryException - Thrown if no memory can be allocated.
    • allocateNewSafe

      boolean allocateNewSafe()
      Allocates new buffers. ValueVector implements logic to determine how much to allocate.
      Returns:
      Returns true if allocation was successful.
    • getAllocator

      BufferAllocator getAllocator()
    • setInitialCapacity

      void setInitialCapacity(int numRecords)
      Set the initial record capacity
      Parameters:
      numRecords -
    • getValueCapacity

      int getValueCapacity()
      Returns the maximum number of values that can be stored in this vector instance.
    • close

      void close()
      Alternative to clear(). Allows use as an AutoCloseable in try-with-resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • clear

      void clear()
      Release the underlying DrillBuf and reset the ValueVector to empty.
    • getField

      MaterializedField getField()
      Get information about how this field is materialized.
    • getTransferPair

      TransferPair getTransferPair(BufferAllocator allocator)
      Returns a transfer pair, creating a new target vector of the same type.
    • getTransferPair

      TransferPair getTransferPair(String ref, BufferAllocator allocator)
    • makeTransferPair

      TransferPair makeTransferPair(ValueVector target)
      Returns a new transfer pair that is used to transfer underlying buffers into the target vector.
    • getAccessor

      ValueVector.Accessor getAccessor()
      Returns an accessor that is used to read from this vector instance.
    • getMutator

      ValueVector.Mutator getMutator()
      Returns an mutator that is used to write to this vector instance.
    • getReader

      FieldReader getReader()
      Returns a field reader that supports reading values from this vector.
    • getMetadata

      Get the metadata for this field. Used in serialization
      Returns:
      FieldMetadata for this field.
    • getBufferSize

      int getBufferSize()
      Returns the number of bytes that is used by this vector instance. This is a bit of a misnomer. Returns the number of bytes used by data in this instance.
    • getAllocatedSize

      int getAllocatedSize()
      Returns the total size of buffers allocated by this vector. Has meaning only when vectors are directly allocated and each vector has its own buffer. Does not have meaning for vectors deserialized from the network or disk in which multiple vectors share the same vector.
      Returns:
      allocated buffer size, in bytes
    • getBufferSizeFor

      int getBufferSizeFor(int valueCount)
      Returns the number of bytes that is used by this vector if it holds the given number of values. The result will be the same as if Mutator.setValueCount() were called, followed by calling getBufferSize(), but without any of the closing side-effects that setValueCount() implies wrt finishing off the population of a vector. Some operations might wish to use this to determine how much memory has been used by a vector so far, even though it is not finished being populated.
      Parameters:
      valueCount - the number of values to assume this vector contains
      Returns:
      the buffer size if this vector is holding valueCount values
    • getBuffers

      DrillBuf[] getBuffers(boolean clear)
      Return the underlying buffers associated with this vector. Note that this doesn't impact the reference counts for this buffer so it only should be used for in-context access. Also note that this buffer changes regularly thus external classes shouldn't hold a reference to it (unless they change it).
      Parameters:
      clear - Whether to clear vector before returning; the buffers will still be refcounted; but the returned array will be the only reference to them
      Returns:
      The underlying buffers that is used by this vector instance.
    • load

      void load(UserBitShared.SerializedField metadata, DrillBuf buffer)
      Load the data provided in the buffer. Typically used when deserializing from the wire.
      Parameters:
      metadata - Metadata used to decode the incoming buffer.
      buffer - The buffer that contains the ValueVector.
    • copyEntry

      void copyEntry(int toIndex, ValueVector from, int fromIndex)
    • collectLedgers

      void collectLedgers(Set<AllocationManager.BufferLedger> ledgers)
      Add the ledgers underlying the buffers underlying the components of the vector to the set provided. Used to determine actual memory allocation.
      Parameters:
      ledgers - set of ledgers to which to add ledgers for this vector
    • getPayloadByteCount

      int getPayloadByteCount(int valueCount)
      Return the number of value bytes consumed by actual data.
    • exchange

      void exchange(ValueVector other)
      Exchange state with another value vector of the same type. Used to implement look-ahead writers.
    • toNullable

      void toNullable(ValueVector nullableVector)
      Convert a non-nullable vector to nullable by shuffling the data from one to the other. Avoids the need to generate copy code just to change mode. If this vector is non-nullable, accepts a nullable dual (same minor type, different mode.) If the vector is non-nullable, or non-scalar, then throws an exception.
      Parameters:
      nullableVector - nullable vector of the same minor type as this vector