Interface ManagedReader

All Known Implementing Classes:
AvroBatchReader, CompliantTextBatchReader, ExcelBatchReader, HttpdLogBatchReader, LogBatchReader, LTSVBatchReader, MSAccessBatchReader, PcapBatchReader, PcapngBatchReader, PdfBatchReader, SasBatchReader, SequenceFileBatchReader, ShpBatchReader, SpssBatchReader, SyslogBatchReader, XMLBatchReader

public interface ManagedReader
Extended version of a record reader which uses a size-aware batch mutator. Use this for all new readers. Replaces the original RecordReader interface.

This interface is used to create readers that work with the projection mechanism to provide services for handling projection, setting up the result set loader, handling schema smoothing, sharing vectors across batches, etc.

Note that this interface reads a batch of rows, not a single row. (The original RecordReader could be confusing in this aspect.)

The expected lifecycle is:

  • The reader factory creates the reader just before using it. (Unlike the old ScanBatch which created all readers at the start of the scan.)
  • Constructor: open the reader using the SchemaNegotiator to configure the scanner framework for this reader by specifying a schema (if known), desired row counts and other configuration options. Call SchemaNegotiator.build() to obtain a RowSetLoader to use to capture the rows that the reader reads.
  • next(): called for each batch. The batch is written using the result set loader obtained above. The scanner framework handles details of tracking version changes, handling overflow, limiting record counts, and so on. Return true to indicate a batch is available, false to indicate EOF. The first call to next() can return false if the data source has no rows.
  • close(): called to release resources. May be called before next() returns false.
  • If an error occurs, the reader can throw a RuntimeException from any method. A UserException is preferred to provide detailed information about the source of the problem.

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Interface
    Description
    static class 
    Exception thrown from the constructor if the data source is empty and can produce no data or schema.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Release resources.
    boolean
    Read the next batch.
  • Method Details

    • next

      boolean next()
      Read the next batch. Reading continues until either EOF, or until the mutator indicates that the batch is full. The batch is considered valid if it is non-empty. Returning true with an empty batch is valid, and is helpful on the very first batch (returning schema only.) An empty batch with a false return code indicates EOF and the batch will be discarded. A non-empty batch along with a false return result indicates a final, valid batch, but that EOF was reached and no more data is available.

      This somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.

      Returns:
      true if more data may be available (and so next() should be called again, false to indicate that EOF was reached
      Throws:
      RuntimeException - (UserException preferred) if an error occurs that should fail the query.
    • close

      void close()
      Release resources. Called just after a failure, when the scanner is cancelled, or after next() returns EOF. Release all resources and close files. Guaranteed to be called if open() returns normally; will not be called if open() throws an exception.
      Throws:
      RuntimeException - (UserException preferred) if an error occurs that should fail the query.