Class HiveDefaultRecordReader
java.lang.Object
org.apache.drill.exec.store.AbstractRecordReader
org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader
- All Implemented Interfaces:
AutoCloseable
,RecordReader
- Direct Known Subclasses:
HiveTextRecordReader
Reader which uses complex writer underneath to fill in value vectors with data read from Hive.
At first glance initialization code in the writer looks cumbersome, but in the end it's main aim is to prepare list of key
fields used in next() and readHiveRecordAndInsertIntoRecordBatch(Object rowValue) methods.
In a nutshell, the reader is used in two stages:
1) Setup stage configures mapredReader, partitionObjInspector, partitionDeserializer, list of HiveValueWriter
s for each column in record
batch, partition vectors and values
2) Reading stage uses objects configured previously to get rows from InputSplits, represent each row as Struct of columns values,
and write each row value of column into Drill's value vectors using HiveValueWriter for each specific column
-
Field Summary
Modifier and TypeFieldDescriptionprotected boolean
At the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.protected static final org.slf4j.Logger
Reader used to to get data from InputSplitsprotected VectorContainerWriter
Manages all writes to value vectors received using OutputMutatorprotected org.apache.hadoop.hive.serde2.Deserializer
Deserializer to be used for deserialization of row.protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter
Converts value deserialized using partitionDeserializerstatic final int
Max amount of records that can be consumed by one next() method callprotected Object
Helper object used together with mapredReader to get data from InputSplit.Fields inherited from class org.apache.drill.exec.store.AbstractRecordReader
DEFAULT_TEXT_COLS_TO_READ
Fields inherited from interface org.apache.drill.exec.store.RecordReader
ALLOCATOR_INITIAL_RESERVATION, ALLOCATOR_MAX_RESERVATION
-
Constructor Summary
ConstructorDescriptionHiveDefaultRecordReader
(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi) Readers constructor called by initializer. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
protected boolean
hasNextValue
(Object valueHolder) Checks and reads next value of input split into valueHolder.protected void
internalInit
(Properties hiveTableProperties) Default implementation does nothing, used to apply skip header/footer functionalityint
next()
Increments this record reader forward, writing via the provided output mutator into the output batch.protected void
readHiveRecordAndInsertIntoRecordBatch
(Object rowValue) void
setup
(OperatorContext context, OutputMutator output) Configure the RecordReader with the provided schema and the record batch that should be written to.Methods inherited from class org.apache.drill.exec.store.AbstractRecordReader
allocate, getColumns, getDefaultColumnsToRead, hasNext, isSkipQuery, isStarQuery, setColumns, toString, transformColumns
-
Field Details
-
logger
protected static final org.slf4j.Logger logger -
TARGET_RECORD_COUNT
public static final int TARGET_RECORD_COUNTMax amount of records that can be consumed by one next() method call- See Also:
-
outputWriter
Manages all writes to value vectors received using OutputMutator -
partitionDeserializer
protected org.apache.hadoop.hive.serde2.Deserializer partitionDeserializerDeserializer to be used for deserialization of row. Depending on partition presence it may be partition or table deserializer. -
partitionToTableSchemaConverter
protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter partitionToTableSchemaConverterConverts value deserialized using partitionDeserializer -
empty
protected boolean emptyAt the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early. -
mapredReader
Reader used to to get data from InputSplits -
valueHolder
Helper object used together with mapredReader to get data from InputSplit.
-
-
Constructor Details
-
HiveDefaultRecordReader
public HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi) Readers constructor called by initializer.- Parameters:
table
- metadata about Hive table being readpartition
- holder of metadata about table partitioninginputSplits
- input splits for reading data from distributed storageprojectedColumns
- target columns for scancontext
- fragmentContext of fragmenthiveConf
- Hive configurationproxyUgi
- user/group info to be used for initialization
-
-
Method Details
-
setup
Description copied from interface:RecordReader
Configure the RecordReader with the provided schema and the record batch that should be written to.- Parameters:
context
- operator context for the readeroutput
- The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.- Throws:
ExecutionSetupException
-
internalInit
Default implementation does nothing, used to apply skip header/footer functionality- Parameters:
hiveTableProperties
- hive table properties
-
next
public int next()Description copied from interface:RecordReader
Increments this record reader forward, writing via the provided output mutator into the output batch.- Returns:
- The number of additional records added to the output.
-
readHiveRecordAndInsertIntoRecordBatch
-
hasNextValue
Checks and reads next value of input split into valueHolder. Note that if current mapredReader doesn't contain data to read from InputSplit, this method will try to initialize reader for next InputSplit and will try to use the new mapredReader.- Parameters:
valueHolder
- holder for next row value data- Returns:
- true if next value present and read into valueHolder
- Throws:
IOException
- exception which may be thrown in case when mapredReader failed to read next valueExecutionSetupException
- exception may be thrown when next input split is present but reader initialization for it failed
-
close
public void close()
-