Class FileScanLifecycle

java.lang.Object
org.apache.drill.exec.physical.impl.scan.v3.lifecycle.ScanLifecycle
org.apache.drill.exec.physical.impl.scan.v3.file.FileScanLifecycle

public class FileScanLifecycle extends ScanLifecycle
The file scan framework adds into the scan framework support for reading from DFS splits (a file and a block) and for the file-related implicit and partition columns. The file scan builder gathers file-related options for the scan as a whole, including the list of splits. The associated FileSchemaNegotiator passes file information to each reader.

Only a single storage plugin uses the file scan framework: the FileSystemPlugin via the EasyFormatPlugin. To make client code as simple as possible, the Drill file system and list of files is passed though this framework to the FileReaderFactory, then to the FileSchemaNegotiator which presents them to the reader. This approach avoids the need for each format handle this common boilerplate code.

The FileScanOptions holds the list of splits to scan. The FileReaderFactory iterates over those splits, and creates each reader just-in-time to process that split.

Implicit columns are defined here at the beginning of the scan as part of the scan schema mechanism. Each consists of a column "marker" that identifies the column purposes. Then, on each file, the implicit column is resolved to a value specific to that file. A StaticBatchBuilder then fills in the needed column values for each batch which the reader produces.