org.apache.drill.exec.store.AbstractRecordReader

org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader

All Implemented Interfaces:: AutoCloseable, RecordReader

Direct Known Subclasses:: HiveTextRecordReader

public class HiveDefaultRecordReader extends AbstractRecordReader

Reader which uses complex writer underneath to fill in value vectors with data read from Hive. At first glance initialization code in the writer looks cumbersome, but in the end it's main aim is to prepare list of key fields used in next() and readHiveRecordAndInsertIntoRecordBatch(Object rowValue) methods.

In a nutshell, the reader is used in two stages: 1) Setup stage configures mapredReader, partitionObjInspector, partitionDeserializer, list of HiveValueWriters for each column in record batch, partition vectors and values 2) Reading stage uses objects configured previously to get rows from InputSplits, represent each row as Struct of columns values, and write each row value of column into Drill's value vectors using HiveValueWriter for each specific column

Field Summary

Fields

Modifier and Type

Field

Description

protected boolean

empty

At the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.

protected static final org.slf4j.Logger

logger

protected org.apache.hadoop.mapred.RecordReader<Object,Object>

mapredReader

Reader used to to get data from InputSplits

protected VectorContainerWriter

outputWriter

Manages all writes to value vectors received using OutputMutator

protected org.apache.hadoop.hive.serde2.Deserializer

partitionDeserializer

Deserializer to be used for deserialization of row.

protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter

partitionToTableSchemaConverter

Converts value deserialized using partitionDeserializer

static final int

TARGET_RECORD_COUNT

Max amount of records that can be consumed by one next() method call

protected Object

valueHolder

Helper object used together with mapredReader to get data from InputSplit.

Fields inherited from class org.apache.drill.exec.store.AbstractRecordReader
DEFAULT_TEXT_COLS_TO_READ

Fields inherited from interface org.apache.drill.exec.store.RecordReader
ALLOCATOR_INITIAL_RESERVATION, ALLOCATOR_MAX_RESERVATION
Constructor Summary

Constructors

Constructor

Description

HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi)

Readers constructor called by initializer.
Method Summary

Modifier and Type

Method

Description

void

close()

protected boolean

hasNextValue(Object valueHolder)

Checks and reads next value of input split into valueHolder.

protected void

internalInit(Properties hiveTableProperties)

Default implementation does nothing, used to apply skip header/footer functionality

int

next()

Increments this record reader forward, writing via the provided output mutator into the output batch.

protected void

readHiveRecordAndInsertIntoRecordBatch(Object rowValue)

void

setup(OperatorContext context, OutputMutator output)

Configure the RecordReader with the provided schema and the record batch that should be written to.

Methods inherited from class org.apache.drill.exec.store.AbstractRecordReader
allocate, getColumns, getDefaultColumnsToRead, hasNext, isSkipQuery, isStarQuery, setColumns, toString, transformColumns

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- logger
  
  protected static final org.slf4j.Logger logger
- TARGET_RECORD_COUNT
  
  public static final int TARGET_RECORD_COUNT
  
  Max amount of records that can be consumed by one next() method call
  See Also:
  
  Constant Field Values
- outputWriter
  
  protected VectorContainerWriter outputWriter
  
  Manages all writes to value vectors received using OutputMutator
- partitionDeserializer
  
  protected org.apache.hadoop.hive.serde2.Deserializer partitionDeserializer
  
  Deserializer to be used for deserialization of row. Depending on partition presence it may be partition or table deserializer.
- partitionToTableSchemaConverter
  
  protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter partitionToTableSchemaConverter
  
  Converts value deserialized using partitionDeserializer
- empty
  
  protected boolean empty
  
  At the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.
- mapredReader
  
  protected org.apache.hadoop.mapred.RecordReader<Object,Object> mapredReader
  
  Reader used to to get data from InputSplits
- valueHolder
  
  protected Object valueHolder
  
  Helper object used together with mapredReader to get data from InputSplit.
Constructor Details
- HiveDefaultRecordReader
  
  public HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi)
  
  Readers constructor called by initializer.
  
  Parameters:
  
  table - metadata about Hive table being read
  
  partition - holder of metadata about table partitioning
  
  inputSplits - input splits for reading data from distributed storage
  
  projectedColumns - target columns for scan
  
  context - fragmentContext of fragment
  
  hiveConf - Hive configuration
  
  proxyUgi - user/group info to be used for initialization
Method Details
- setup
  
  public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException
  
  Description copied from interface: RecordReader
  
  Configure the RecordReader with the provided schema and the record batch that should be written to.
  
  Parameters:
  
  context - operator context for the reader
  
  output - The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.
  
  Throws:
  
  ExecutionSetupException
- internalInit
  
  protected void internalInit(Properties hiveTableProperties)
  
  Default implementation does nothing, used to apply skip header/footer functionality
  
  Parameters:
  
  hiveTableProperties - hive table properties
- next
  
  public int next()
  
  Description copied from interface: RecordReader
  
  Increments this record reader forward, writing via the provided output mutator into the output batch.
  
  Returns:
  
  The number of additional records added to the output.
- readHiveRecordAndInsertIntoRecordBatch
  
  protected void readHiveRecordAndInsertIntoRecordBatch(Object rowValue)
- hasNextValue
  
  protected boolean hasNextValue(Object valueHolder) throws IOException, ExecutionSetupException
  
  Checks and reads next value of input split into valueHolder. Note that if current mapredReader doesn't contain data to read from InputSplit, this method will try to initialize reader for next InputSplit and will try to use the new mapredReader.
  
  Parameters:
  
  valueHolder - holder for next row value data
  
  Returns:
  
  true if next value present and read into valueHolder
  
  Throws:
  
  IOException - exception which may be thrown in case when mapredReader failed to read next value
  
  ExecutionSetupException - exception may be thrown when next input split is present but reader initialization for it failed
- close
  
  public void close()

Class HiveDefaultRecordReader

Field Summary

Fields inherited from class org.apache.drill.exec.store.AbstractRecordReader

Fields inherited from interface org.apache.drill.exec.store.RecordReader

Constructor Summary

Method Summary

Methods inherited from class org.apache.drill.exec.store.AbstractRecordReader

Methods inherited from class java.lang.Object

Field Details

logger

TARGET_RECORD_COUNT

outputWriter

partitionDeserializer

partitionToTableSchemaConverter

empty

mapredReader

valueHolder

Constructor Details

HiveDefaultRecordReader

Method Details

setup

internalInit

next

readHiveRecordAndInsertIntoRecordBatch

hasNextValue

close