Class HiveDefaultRecordReader
java.lang.Object
org.apache.drill.exec.store.AbstractRecordReader
org.apache.drill.exec.store.hive.readers.HiveDefaultRecordReader
- All Implemented Interfaces:
- AutoCloseable,- RecordReader
- Direct Known Subclasses:
- HiveTextRecordReader
Reader which uses complex writer underneath to fill in value vectors with data read from Hive.
 At first glance initialization code in the writer looks cumbersome, but in the end it's main aim is to prepare list of key
 fields used in next() and readHiveRecordAndInsertIntoRecordBatch(Object rowValue) methods.
 
 In a nutshell, the reader is used in two stages:
 1) Setup stage configures mapredReader, partitionObjInspector, partitionDeserializer, list of HiveValueWriters for each column in record
 batch, partition vectors and values
 2) Reading stage uses objects configured previously to get rows from InputSplits, represent each row as Struct of columns values,
 and write each row value of column into Drill's value vectors using HiveValueWriter for each specific column
- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected booleanAt the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.protected static final org.slf4j.LoggerReader used to to get data from InputSplitsprotected VectorContainerWriterManages all writes to value vectors received using OutputMutatorprotected org.apache.hadoop.hive.serde2.DeserializerDeserializer to be used for deserialization of row.protected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.ConverterConverts value deserialized using partitionDeserializerstatic final intMax amount of records that can be consumed by one next() method callprotected ObjectHelper object used together with mapredReader to get data from InputSplit.Fields inherited from class org.apache.drill.exec.store.AbstractRecordReaderDEFAULT_TEXT_COLS_TO_READFields inherited from interface org.apache.drill.exec.store.RecordReaderALLOCATOR_INITIAL_RESERVATION, ALLOCATOR_MAX_RESERVATION
- 
Constructor SummaryConstructorsConstructorDescriptionHiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi) Readers constructor called by initializer.
- 
Method SummaryModifier and TypeMethodDescriptionvoidclose()protected booleanhasNextValue(Object valueHolder) Checks and reads next value of input split into valueHolder.protected voidinternalInit(Properties hiveTableProperties) Default implementation does nothing, used to apply skip header/footer functionalityintnext()Increments this record reader forward, writing via the provided output mutator into the output batch.protected voidreadHiveRecordAndInsertIntoRecordBatch(Object rowValue) voidsetup(OperatorContext context, OutputMutator output) Configure the RecordReader with the provided schema and the record batch that should be written to.Methods inherited from class org.apache.drill.exec.store.AbstractRecordReaderallocate, getColumns, getDefaultColumnsToRead, hasNext, isSkipQuery, isStarQuery, setColumns, toString, transformColumns
- 
Field Details- 
loggerprotected static final org.slf4j.Logger logger
- 
TARGET_RECORD_COUNTpublic static final int TARGET_RECORD_COUNTMax amount of records that can be consumed by one next() method call- See Also:
 
- 
outputWriterManages all writes to value vectors received using OutputMutator
- 
partitionDeserializerprotected org.apache.hadoop.hive.serde2.Deserializer partitionDeserializerDeserializer to be used for deserialization of row. Depending on partition presence it may be partition or table deserializer.
- 
partitionToTableSchemaConverterprotected org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.Converter partitionToTableSchemaConverterConverts value deserialized using partitionDeserializer
- 
emptyprotected boolean emptyAt the moment of mapredReader instantiation we can check inputSplits, if splits aren't present than there are no records to read, so mapredReader can finish work early.
- 
mapredReaderReader used to to get data from InputSplits
- 
valueHolderHelper object used together with mapredReader to get data from InputSplit.
 
- 
- 
Constructor Details- 
HiveDefaultRecordReaderpublic HiveDefaultRecordReader(HiveTableWithColumnCache table, HivePartition partition, Collection<org.apache.hadoop.mapred.InputSplit> inputSplits, List<SchemaPath> projectedColumns, FragmentContext context, org.apache.hadoop.hive.conf.HiveConf hiveConf, org.apache.hadoop.security.UserGroupInformation proxyUgi) Readers constructor called by initializer.- Parameters:
- table- metadata about Hive table being read
- partition- holder of metadata about table partitioning
- inputSplits- input splits for reading data from distributed storage
- projectedColumns- target columns for scan
- context- fragmentContext of fragment
- hiveConf- Hive configuration
- proxyUgi- user/group info to be used for initialization
 
 
- 
- 
Method Details- 
setupDescription copied from interface:RecordReaderConfigure the RecordReader with the provided schema and the record batch that should be written to.- Parameters:
- context- operator context for the reader
- output- The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.
- Throws:
- ExecutionSetupException
 
- 
internalInitDefault implementation does nothing, used to apply skip header/footer functionality- Parameters:
- hiveTableProperties- hive table properties
 
- 
nextpublic int next()Description copied from interface:RecordReaderIncrements this record reader forward, writing via the provided output mutator into the output batch.- Returns:
- The number of additional records added to the output.
 
- 
readHiveRecordAndInsertIntoRecordBatch
- 
hasNextValueChecks and reads next value of input split into valueHolder. Note that if current mapredReader doesn't contain data to read from InputSplit, this method will try to initialize reader for next InputSplit and will try to use the new mapredReader.- Parameters:
- valueHolder- holder for next row value data
- Returns:
- true if next value present and read into valueHolder
- Throws:
- IOException- exception which may be thrown in case when mapredReader failed to read next value
- ExecutionSetupException- exception may be thrown when next input split is present but reader initialization for it failed
 
- 
closepublic void close()
 
-