public class HiveDefaultRecordReader extends AbstractRecordReader
In a nutshell, the reader is used in two stages:
1) Setup stage configures mapredReader, partitionObjInspector, partitionDeserializer, list of HiveValueWriter
s for each column in record
batch, partition vectors and values
2) Reading stage uses objects configured previously to get rows from InputSplits, represent each row as Struct of columns values,
and write each row value of column into Drill's value vectors using HiveValueWriter for each specific column
Modifier and Type | Field and Description |
---|---|
protected boolean |
empty
At the moment of mapredReader instantiation we can check inputSplits,
if splits aren't present than there are no records to read,
so mapredReader can finish work early.
|
protected static org.slf4j.Logger |
logger |
protected org.apache.hadoop.mapred.RecordReader<Object,Object> |
mapredReader
Reader used to to get data from InputSplits
|
protected VectorContainerWriter |
outputWriter
Manages all writes to value vectors received using OutputMutator
|
protected org.apache.hadoop.hive.serde2.Deserializer |
partitionDeserializer
Deserializer to be used for deserialization of row.
|
protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter |
partitionToTableSchemaConverter
Converts value deserialized using partitionDeserializer
|
static int |
TARGET_RECORD_COUNT
Max amount of records that can be consumed by one next() method call
|
protected Object |
valueHolder
Helper object used together with mapredReader to get data from InputSplit.
|
DEFAULT_TEXT_COLS_TO_READ
ALLOCATOR_INITIAL_RESERVATION, ALLOCATOR_MAX_RESERVATION
Constructor and Description |
---|
HiveDefaultRecordReader(HiveTableWithColumnCache table,
HivePartition partition,
Collection<org.apache.hadoop.mapred.InputSplit> inputSplits,
List<SchemaPath> projectedColumns,
FragmentContext context,
org.apache.hadoop.hive.conf.HiveConf hiveConf,
org.apache.hadoop.security.UserGroupInformation proxyUgi)
Readers constructor called by initializer.
|
Modifier and Type | Method and Description |
---|---|
void |
close() |
protected boolean |
hasNextValue(Object valueHolder)
Checks and reads next value of input split into valueHolder.
|
protected void |
internalInit(Properties hiveTableProperties)
Default implementation does nothing, used to apply skip header/footer functionality
|
int |
next()
Increments this record reader forward, writing via the provided output
mutator into the output batch.
|
protected void |
readHiveRecordAndInsertIntoRecordBatch(Object rowValue) |
void |
setup(OperatorContext context,
OutputMutator output)
Configure the RecordReader with the provided schema and the record batch that should be written to.
|
allocate, getColumns, getDefaultColumnsToRead, hasNext, isSkipQuery, isStarQuery, setColumns, toString, transformColumns
protected static final org.slf4j.Logger logger
public static final int TARGET_RECORD_COUNT
protected VectorContainerWriter outputWriter
protected org.apache.hadoop.hive.serde2.Deserializer partitionDeserializer
protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter partitionToTableSchemaConverter
protected boolean empty
protected org.apache.hadoop.mapred.RecordReader<Object,Object> mapredReader
protected Object valueHolder
public HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi)
table
- metadata about Hive table being readpartition
- holder of metadata about table partitioninginputSplits
- input splits for reading data from distributed storageprojectedColumns
- target columns for scancontext
- fragmentContext of fragmenthiveConf
- Hive configurationproxyUgi
- user/group info to be used for initializationpublic void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException
RecordReader
context
- operator context for the readeroutput
- The place where output for a particular scan should be written. The record reader is responsible for
mutating the set of schema values for that particular record.ExecutionSetupException
protected void internalInit(Properties hiveTableProperties)
hiveTableProperties
- hive table propertiespublic int next()
RecordReader
protected void readHiveRecordAndInsertIntoRecordBatch(Object rowValue)
protected boolean hasNextValue(Object valueHolder) throws IOException, ExecutionSetupException
valueHolder
- holder for next row value dataIOException
- exception which may be thrown in case when mapredReader failed to read next valueExecutionSetupException
- exception may be thrown when next input split is present but reader
initialization for it failedpublic void close()
Copyright © 1970 The Apache Software Foundation. All rights reserved.