public interface PartitionExplorer
select * from dfs.my_workspace.data_directory where dir0 = '2014_01';This assumes that below data_directory there are sub-directories with years and month numbers as folder names, and data stored below them. This works in cases where the partition column is known, but the current implementation does not allow the partition information itself to be queried. An example of such behavior would be a query that should always return the latest month of data, without having to be updated periodically. While it is possible to write a query like the one below, it will be very expensive, as this currently is materialized as a full table scan followed by an aggregation on the partition dir0 column and finally a filter.
select * from dfs.my_workspace.data_directory where dir0 in (select MAX(dir0) from dfs.my_workspace.data_directory);This interface allows the definition of a UDF to perform the sub-query on the list of partitions. This UDF can be used at planning time to prune out all of the unnecessary reads of the previous example.
select * from dfs.my_workspace.data_directory where dir0 = maxdir('dfs.my_workspace', 'data_directory');Look at
DirectoryExplorers
for examples of UDFs that use this interface to query against
partition information.Modifier and Type | Method and Description |
---|---|
Iterable<String> |
getSubPartitions(String schema,
String table,
List<String> partitionColumns,
List<String> partitionValues)
For the schema provided,
get a list of sub-partitions of a particular table and the partitions
specified by partition columns and values.
|
Iterable<String> getSubPartitions(String schema, String table, List<String> partitionColumns, List<String> partitionValues) throws PartitionNotFoundException
SchemaPartitionExplorer
.schema
- schema path, can be complete or relative to the default schemapartitionColumns
- a list of partitions to matchpartitionValues
- list of values of each partition (corresponding
to the partition column list)PartitionNotFoundException
- when the partition does not exist in
the given workspaceCopyright © 1970 The Apache Software Foundation. All rights reserved.