Class HiveDefaultRecordReader

java.lang.Object
org.apache.drill.exec.store.AbstractRecordReader
org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader
All Implemented Interfaces:
AutoCloseable, RecordReader
Direct Known Subclasses:
HiveTextRecordReader

public class HiveDefaultRecordReader extends AbstractRecordReader
Reader which uses complex writer underneath to fill in value vectors with data read from Hive. At first glance initialization code in the writer looks cumbersome, but in the end it's main aim is to prepare list of key fields used in next() and readHiveRecordAndInsertIntoRecordBatch(Object rowValue) methods.

In a nutshell, the reader is used in two stages: 1) Setup stage configures mapredReader, partitionObjInspector, partitionDeserializer, list of HiveValueWriters for each column in record batch, partition vectors and values 2) Reading stage uses objects configured previously to get rows from InputSplits, represent each row as Struct of columns values, and write each row value of column into Drill's value vectors using HiveValueWriter for each specific column

  • Field Details

    • logger

      protected static final org.slf4j.Logger logger
    • TARGET_RECORD_COUNT

      public static final int TARGET_RECORD_COUNT
      Max amount of records that can be consumed by one next() method call
      See Also:
    • outputWriter

      protected VectorContainerWriter outputWriter
      Manages all writes to value vectors received using OutputMutator
    • partitionDeserializer

      protected org.apache.hadoop.hive.serde2.Deserializer partitionDeserializer
      Deserializer to be used for deserialization of row. Depending on partition presence it may be partition or table deserializer.
    • partitionToTableSchemaConverter

      protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter partitionToTableSchemaConverter
      Converts value deserialized using partitionDeserializer
    • empty

      protected boolean empty
      At the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.
    • mapredReader

      protected org.apache.hadoop.mapred.RecordReader<Object,Object> mapredReader
      Reader used to to get data from InputSplits
    • valueHolder

      protected Object valueHolder
      Helper object used together with mapredReader to get data from InputSplit.
  • Constructor Details

    • HiveDefaultRecordReader

      public HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi)
      Readers constructor called by initializer.
      Parameters:
      table - metadata about Hive table being read
      partition - holder of metadata about table partitioning
      inputSplits - input splits for reading data from distributed storage
      projectedColumns - target columns for scan
      context - fragmentContext of fragment
      hiveConf - Hive configuration
      proxyUgi - user/group info to be used for initialization
  • Method Details

    • setup

      public void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException
      Description copied from interface: RecordReader
      Configure the RecordReader with the provided schema and the record batch that should be written to.
      Parameters:
      context - operator context for the reader
      output - The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.
      Throws:
      ExecutionSetupException
    • internalInit

      protected void internalInit(Properties hiveTableProperties)
      Default implementation does nothing, used to apply skip header/footer functionality
      Parameters:
      hiveTableProperties - hive table properties
    • next

      public int next()
      Description copied from interface: RecordReader
      Increments this record reader forward, writing via the provided output mutator into the output batch.
      Returns:
      The number of additional records added to the output.
    • readHiveRecordAndInsertIntoRecordBatch

      protected void readHiveRecordAndInsertIntoRecordBatch(Object rowValue)
    • hasNextValue

      protected boolean hasNextValue(Object valueHolder) throws IOException, ExecutionSetupException
      Checks and reads next value of input split into valueHolder. Note that if current mapredReader doesn't contain data to read from InputSplit, this method will try to initialize reader for next InputSplit and will try to use the new mapredReader.
      Parameters:
      valueHolder - holder for next row value data
      Returns:
      true if next value present and read into valueHolder
      Throws:
      IOException - exception which may be thrown in case when mapredReader failed to read next value
      ExecutionSetupException - exception may be thrown when next input split is present but reader initialization for it failed
    • close

      public void close()