Interface ManagedReader
- All Known Implementing Classes:
AvroBatchReader
,CompliantTextBatchReader
,ExcelBatchReader
,HttpdLogBatchReader
,LogBatchReader
,LTSVBatchReader
,MSAccessBatchReader
,PcapBatchReader
,PcapngBatchReader
,PdfBatchReader
,SasBatchReader
,SequenceFileBatchReader
,ShpBatchReader
,SpssBatchReader
,SyslogBatchReader
,XMLBatchReader
public interface ManagedReader
Extended version of a record reader which uses a size-aware batch mutator.
Use this for all new readers. Replaces the original
RecordReader
interface.
This interface is used to create readers that work with the projection mechanism to provide services for handling projection, setting up the result set loader, handling schema smoothing, sharing vectors across batches, etc.
Note that this interface reads a batch of rows, not a single row. (The
original RecordReader
could be confusing in this aspect.)
The expected lifecycle is:
- The reader factory creates the reader just before using it. (Unlike
the old
ScanBatch
which created all readers at the start of the scan.) - Constructor: open the reader using the
SchemaNegotiator
to configure the scanner framework for this reader by specifying a schema (if known), desired row counts and other configuration options. CallSchemaNegotiator.build()
to obtain aRowSetLoader
to use to capture the rows that the reader reads. next()
: called for each batch. The batch is written using the result set loader obtained above. The scanner framework handles details of tracking version changes, handling overflow, limiting record counts, and so on. Return true to indicate a batch is available, false to indicate EOF. The first call to next() can return false if the data source has no rows.close()
: called to release resources. May be called before next() returns false.
If an error occurs, the reader can throw a RuntimeException
from any method. A UserException is preferred to provide
detailed information about the source of the problem.
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic class
Exception thrown from the constructor if the data source is empty and can produce no data or schema. -
Method Summary
-
Method Details
-
next
boolean next()Read the next batch. Reading continues until either EOF, or until the mutator indicates that the batch is full. The batch is considered valid if it is non-empty. Returning true with an empty batch is valid, and is helpful on the very first batch (returning schema only.) An empty batch with a false return code indicates EOF and the batch will be discarded. A non-empty batch along with a false return result indicates a final, valid batch, but that EOF was reached and no more data is available.This somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.
- Returns:
- true if more data may be available (and so next() should be called again, false to indicate that EOF was reached
- Throws:
RuntimeException
- (UserException preferred) if an error occurs that should fail the query.
-
close
void close()Release resources. Called just after a failure, when the scanner is cancelled, or after next() returns EOF. Release all resources and close files. Guaranteed to be called if open() returns normally; will not be called if open() throws an exception.- Throws:
RuntimeException
- (UserException preferred) if an error occurs that should fail the query.
-