Class EasyFormatPlugin<T extends FormatPluginConfig>
java.lang.Object
org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin<T>
- Type Parameters:
T- the format plugin config for this reader
- All Implemented Interfaces:
FormatPlugin
- Direct Known Subclasses:
AvroFormatPlugin,BasePcapFormatPlugin,ExcelFormatPlugin,HDF5FormatPlugin,HttpdLogFormatPlugin,ImageFormatPlugin,JSONFormatPlugin,LogFormatPlugin,LTSVFormatPlugin,MSAccessFormatPlugin,PdfFormatPlugin,SasFormatPlugin,SequenceFileFormatPlugin,ShpFormatPlugin,SpssFormatPlugin,SyslogFormatPlugin,TextFormatPlugin,XMLFormatPlugin
public abstract class EasyFormatPlugin<T extends FormatPluginConfig>
extends Object
implements FormatPlugin
Base class for file readers.
Provides a bridge between the legacy RecordReader-style
readers and the newer ManagedReader style. Over time, split the
class, or provide a cleaner way to handle the differences.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classDefines the static, programmer-defined options for this plugin.static classstatic enum -
Field Summary
Fields -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedEasyFormatPlugin(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName) Legacy constructor.protectedEasyFormatPlugin(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig) Revised constructor in which settings are gathered into a configuration object. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidconfigureScan(FileScanLifecycleBuilder builder, EasySubScan scan) Configure an EVF (v2) scan, which must at least include the factory to create readers.protected FileScanFramework.FileScanBuilderframeworkBuilder(EasySubScan scan, OptionSet options) Create the plugin-specific framework that manages the scan.org.apache.hadoop.conf.ConfigurationgetGroupScan(String userName, FileSelection selection, List<SchemaPath> columns) getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager) getName()protected CloseableRecordBatchgetReaderBatch(FragmentContext context, EasySubScan scan) getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName) Return a record reader for the specific file format, when using the originalScanBatchscanner.getRecordWriter(FragmentContext context, EasyWriter writer) protected ScanStatsgetScanStats(PlannerSettings settings, EasyGroupScan scan) getStatisticsRecordWriter(FragmentContext context, EasyWriter writer) getWriter(PhysicalOperator child, String location, List<String> partitionColumns) getWriterBatch(FragmentContext context, RecordBatch incoming, EasyWriter writer) protected voidinitScanBuilder(FileScanFramework.FileScanBuilder builder, EasySubScan scan) Initialize the scan framework builder with standard options.booleanWhether or not you can split the format based on blocks within file boundaries.booleanIndicates whether or not this format could also be in a compression container (for example: csv.gz versus csv).booleanisStatisticsRecordWriter(FragmentContext context, EasyWriter writer) newBatchReader(EasySubScan scan, OptionSet options) For EVF V1, to be removed.readStatistics(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) protected EasyFormatPlugin.ScanFrameworkVersionscanVersion(OptionSet options) Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch.booleanIndicates whether this FormatPlugin supports auto-partitioning for CTAS statementsbooleanWhether this format plugin supports implicit file columns.booleanDoes this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached.booleanDoes this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?booleanbooleanbooleanvoidwriteStatistics(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.drill.exec.store.dfs.FormatPlugin
getGroupScan, getGroupScan, getOptimizerRules
-
Field Details
-
formatConfig
-
-
Constructor Details
-
EasyFormatPlugin
protected EasyFormatPlugin(String name, DrillbitContext context, org.apache.hadoop.conf.Configuration fsConf, StoragePluginConfig storageConfig, T formatConfig, boolean readable, boolean writable, boolean blockSplittable, boolean compressible, List<String> extensions, String defaultName) Legacy constructor. -
EasyFormatPlugin
protected EasyFormatPlugin(String name, EasyFormatPlugin.EasyFormatConfig config, DrillbitContext context, StoragePluginConfig storageConfig, T formatConfig) Revised constructor in which settings are gathered into a configuration object.- Parameters:
name- name of the pluginconfig- configuration options for this plugin which determine developer-defined runtime behaviorcontext- the global server-wide Drillbit contextstorageConfig- the configuration for the storage plugin that owns this format pluginformatConfig- the Jackson-serialized format configuration as created by the user in the Drill web console. Holds user-defined options
-
-
Method Details
-
getFsConf
public org.apache.hadoop.conf.Configuration getFsConf()- Specified by:
getFsConfin interfaceFormatPlugin
-
getContext
- Specified by:
getContextin interfaceFormatPlugin
-
easyConfig
-
getName
- Specified by:
getNamein interfaceFormatPlugin
-
supportsLimitPushdown
public boolean supportsLimitPushdown()Does this plugin support pushing the limit down to the batch reader? If so, then the reader itself should have logic to stop reading the file as soon as the limit has been reached. It makes the most sense to do this with file formats that have consistent schemata that are identified at the first row. CSV for example. If the user only wants 100 rows, it does not make sense to read the entire file. -
supportsPushDown
public boolean supportsPushDown()Does this plugin support projection push down? That is, can the reader itself handle the tasks of projecting table columns, creating null columns for missing table columns, and so on?- Returns:
trueif the plugin supports projection push-down,falseif Drill should do the task by adding a project operator
-
supportsFileImplicitColumns
public boolean supportsFileImplicitColumns()Whether this format plugin supports implicit file columns.- Returns:
trueif the plugin supports implicit file columns,falseotherwise
-
isBlockSplittable
public boolean isBlockSplittable()Whether or not you can split the format based on blocks within file boundaries. If not, the simple format engine will only split on file boundaries.- Returns:
trueif splitable.
-
isCompressible
public boolean isCompressible()Indicates whether or not this format could also be in a compression container (for example: csv.gz versus csv). If this format uses its own internal compression scheme, such as Parquet does, then this should return false.- Returns:
trueif it is compressible
-
getRecordReader
public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, List<SchemaPath> columns, String userName) throws ExecutionSetupException Return a record reader for the specific file format, when using the originalScanBatchscanner.- Parameters:
context- fragment contextdfs- Drill file systemfileWork- metadata about the file to be scannedcolumns- list of projected columns (or may just contain the wildcard)userName- the name of the user running the query- Returns:
- a record reader for this format
- Throws:
ExecutionSetupException- for many reasons
-
getReaderBatch
protected CloseableRecordBatch getReaderBatch(FragmentContext context, EasySubScan scan) throws ExecutionSetupException - Throws:
ExecutionSetupException
-
scanVersion
Choose whether to use the enhanced scan based on the row set and scan framework, or the "traditional" ad-hoc structure based on ScanBatch. Normally set as a config option. Override this method if you want to make the choice based on a system/session option.- Returns:
- true to use the enhanced scan framework, false for the traditional scan-batch framework
-
initScanBuilder
Initialize the scan framework builder with standard options. Call this from the plugin-specificframeworkBuilder(EasySubScan, OptionSet)method. The plugin can then customize/revise options as needed.For EVF V1, to be removed.
- Parameters:
builder- the scan framework builder you create in theframeworkBuilder(EasySubScan, OptionSet)methodscan- the physical scan operator definition passed to theframeworkBuilder(EasySubScan, OptionSet)method
-
newBatchReader
public ManagedReader<? extends FileScanFramework.FileSchemaNegotiator> newBatchReader(EasySubScan scan, OptionSet options) throws ExecutionSetupException For EVF V1, to be removed.- Throws:
ExecutionSetupException
-
frameworkBuilder
protected FileScanFramework.FileScanBuilder frameworkBuilder(EasySubScan scan, OptionSet options) throws ExecutionSetupException Create the plugin-specific framework that manages the scan. The framework creates batch readers one by one for each file or block. It defines semantic rules for projection. It handles "early" or "late" schema readers. A typical framework builds on standardized frameworks for files in general or text files in particular.For EVF V1, to be removed.
- Parameters:
scan- the physical operation definition for the scan operation. Contains one or more files to read. (The Easy format plugin works only for files.)- Returns:
- the scan framework which orchestrates the scan operation across potentially many files
- Throws:
ExecutionSetupException- for all setup failures
-
configureScan
Configure an EVF (v2) scan, which must at least include the factory to create readers.- Parameters:
builder- the builder with default options already set, and which allows the plugin implementation to set others
-
isStatisticsRecordWriter
-
getRecordWriter
- Throws:
IOException
-
getStatisticsRecordWriter
public StatisticsRecordWriter getStatisticsRecordWriter(FragmentContext context, EasyWriter writer) throws IOException - Throws:
IOException
-
getWriterBatch
public CloseableRecordBatch getWriterBatch(FragmentContext context, RecordBatch incoming, EasyWriter writer) throws ExecutionSetupException - Throws:
ExecutionSetupException
-
getScanStats
-
getWriter
public AbstractWriter getWriter(PhysicalOperator child, String location, List<String> partitionColumns) - Specified by:
getWriterin interfaceFormatPlugin
-
getGroupScan
public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns) throws IOException - Specified by:
getGroupScanin interfaceFormatPlugin- Throws:
IOException
-
getGroupScan
public AbstractGroupScan getGroupScan(String userName, FileSelection selection, List<SchemaPath> columns, MetadataProviderManager metadataProviderManager) throws IOException - Specified by:
getGroupScanin interfaceFormatPlugin- Throws:
IOException
-
getConfig
- Specified by:
getConfigin interfaceFormatPlugin
-
getStorageConfig
- Specified by:
getStorageConfigin interfaceFormatPlugin
-
supportsRead
public boolean supportsRead()- Specified by:
supportsReadin interfaceFormatPlugin
-
supportsWrite
public boolean supportsWrite()- Specified by:
supportsWritein interfaceFormatPlugin
-
supportsAutoPartitioning
public boolean supportsAutoPartitioning()Description copied from interface:FormatPluginIndicates whether this FormatPlugin supports auto-partitioning for CTAS statements- Specified by:
supportsAutoPartitioningin interfaceFormatPlugin- Returns:
- true if auto-partitioning is supported
-
getMatcher
- Specified by:
getMatcherin interfaceFormatPlugin
-
getOptimizerRules
- Specified by:
getOptimizerRulesin interfaceFormatPlugin
-
getReaderOperatorType
-
getWriterOperatorType
-
supportsStatistics
public boolean supportsStatistics()- Specified by:
supportsStatisticsin interfaceFormatPlugin
-
readStatistics
public DrillStatsTable.TableStatistics readStatistics(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException - Specified by:
readStatisticsin interfaceFormatPlugin- Throws:
IOException
-
writeStatistics
public void writeStatistics(DrillStatsTable.TableStatistics statistics, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path statsTablePath) throws IOException - Specified by:
writeStatisticsin interfaceFormatPlugin- Throws:
IOException
-