Interface ManagedReader<T extends SchemaNegotiator>
- All Known Implementing Classes:
EnumerableRecordReader
,ExtendedMockBatchReader
,GoogleSheetsBatchReader
,HDF5BatchReader
,HttpBatchReader
,HttpCSVBatchReader
,HttpXMLBatchReader
,IcebergRecordReader
,ImageBatchReader
,JdbcBatchReader
,JsonBatchReader
,JsonStreamBatchReader
,KafkaRecordReader
,PhoenixBatchReader
,SplunkBatchReader
public interface ManagedReader<T extends SchemaNegotiator>
Extended version of a record reader which uses a size-aware batch mutator.
Use this for all new readers. Replaces the original
RecordReader
interface.
This interface is used to create readers that work with the projection mechanism to provide services for handling projection, setting up the result set loader, handling schema smoothing, sharing vectors across batches, etc.
Note that this interface reads a batch of rows, not a single row. (The original RecordReader could be confusing in this aspect.)
The expected lifecycle is:
- Constructor: allocate no resources. Obtain a reference to a reader-specific schema and projection manager.
open(SchemaNegotiator)
: Use the providedSchemaNegotiator
to configure the scanner framework for this reader by specifying a schema (if known), desired row counts and other configuration options. CallSchemaNegotiator.build()
to obtain aRowSetLoader
to use to capture the rows that the reader reads.next()
: called for each batch. The batch is written using the result set loader obtained above. The scanner framework handles details of tracking version changes, handling overflow, limiting record counts, and so on. Return true to indicate a batch is available, false to indicate EOF. The first call to next() can return false if the data source has no rows.close()
: called to release resources. May be called before next() returns false.
If an error occurs, the reader can throw a RuntimeException
from any method. A UserException is preferred to provide
detailed information about the source of the problem.
-
Method Summary
-
Method Details
-
open
Setup the record reader. Called just before the first call to next(). Allocate resources here, not in the constructor. Example: open files, allocate buffers, etc.- Parameters:
negotiator
- mechanism to negotiate select and table schemas, then create the row set reader used to load data into value vectors- Returns:
- true if the reader is open and ready to read (possibly no) rows. false for a "soft" failure in which no schema or data is available, but the scanner should not fail, it should move onto another reader
- Throws:
RuntimeException
- for "hard" errors that should terminate the query. UserException preferred to explain the problem better than the scan operator can by guessing at the cause
-
next
boolean next()Read the next batch. Reading continues until either EOF, or until the mutator indicates that the batch is full. The batch is considered valid if it is non-empty. Returning true with an empty batch is valid, and is helpful on the very first batch (returning schema only.) An empty batch with a false return code indicates EOF and the batch will be discarded. A non-empty batch along with a false return result indicates a final, valid batch, but that EOF was reached and no more data is available.This somewhat complex protocol avoids the need to allocate a final batch just to find out that no more data is available; it allows EOF to be returned along with the final batch.
- Returns:
- true if more data may be available (and so next() should be called again, false to indicate that EOF was reached
- Throws:
RuntimeException
- (UserException preferred) if an error occurs that should fail the query.
-
close
void close()Release resources. Called just after a failure, when the scanner is cancelled, or after next() returns EOF. Release all resources and close files. Guaranteed to be called if open() returns normally; will not be called if open() throws an exception.- Throws:
RuntimeException
- (UserException preferred) if an error occurs that should fail the query.
-