Class ParquetRecordReader

All Implemented Interfaces:
AutoCloseable, RecordReader

public class ParquetRecordReader extends CommonParquetRecordReader
  • Constructor Details

    • ParquetRecordReader

      public ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, long numRecordsToRead, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
    • ParquetRecordReader

      public ParquetRecordReader(FragmentContext fragmentContext, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
    • ParquetRecordReader

      public ParquetRecordReader(FragmentContext fragmentContext, long numRecordsToRead, org.apache.hadoop.fs.Path path, int rowGroupIndex, org.apache.hadoop.fs.FileSystem fs, org.apache.parquet.compression.CompressionCodecFactory codecFactory, org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, ParquetReaderUtility.DateCorruptionStatus dateCorruptionStatus)
  • Method Details

    • getDateCorruptionStatus

      public ParquetReaderUtility.DateCorruptionStatus getDateCorruptionStatus()
      Flag indicating if the old non-standard data format appears in this file, see DRILL-4203.
      Returns:
      true if the dates are corrupted and need to be corrected
    • getCodecFactory

      public org.apache.parquet.compression.CompressionCodecFactory getCodecFactory()
    • getHadoopPath

      public org.apache.hadoop.fs.Path getHadoopPath()
    • getFileSystem

      public org.apache.hadoop.fs.FileSystem getFileSystem()
    • getRowGroupIndex

      public int getRowGroupIndex()
    • getBatchSizesMgr

      public RecordBatchSizerManager getBatchSizesMgr()
    • getOperatorContext

      public OperatorContext getOperatorContext()
    • getFragmentContext

      public FragmentContext getFragmentContext()
    • useBulkReader

      public boolean useBulkReader()
      Returns:
      true if Parquet reader Bulk processing is enabled; false otherwise
    • getReadState

      public ReadState getReadState()
    • setup

      public void setup(OperatorContext operatorContext, OutputMutator output) throws ExecutionSetupException
      Prepare the Parquet reader. First determine the set of columns to read (the schema for this read.) Then, create a state object to track the read across calls to the reader next() method. Finally, create one of three readers to read batches depending on whether this scan is for only fixed-width fields, contains at least one variable-width field, or is a "mock" scan consisting only of null fields (fields in the SELECT clause but not in the Parquet file.)
      Parameters:
      operatorContext - operator context for the reader
      output - The place where output for a particular scan should be written. The record reader is responsible for mutating the set of schema values for that particular record.
      Throws:
      ExecutionSetupException
    • allocate

      public void allocate(Map<String,ValueVector> vectorMap) throws OutOfMemoryException
      Specified by:
      allocate in interface RecordReader
      Overrides:
      allocate in class AbstractRecordReader
      Throws:
      OutOfMemoryException
    • next

      public int next()
      Read the next record batch from the file using the reader and read state created previously.
      Returns:
      The number of additional records added to the output.
    • close

      public void close()
    • getDefaultColumnsToRead

      protected List<SchemaPath> getDefaultColumnsToRead()
      Overrides:
      getDefaultColumnsToRead in class AbstractRecordReader
    • toString

      public String toString()
      Overrides:
      toString in class AbstractRecordReader