Class ColumnsArrayParser

java.lang.Object
org.apache.drill.exec.physical.impl.scan.columns.ColumnsArrayParser
All Implemented Interfaces:
ScanLevelProjection.ScanProjectionParser

public class ColumnsArrayParser extends Object implements ScanLevelProjection.ScanProjectionParser
Parses the `columns` array. Doing so is surprisingly complex.
  • Depending on what is known about the input file, the `columns` array may be required or optional.
  • If the columns array is required, then the wildcard (`*`) expands to `columns`.
  • If the columns array appears, then no other table columns can appear.
  • Both 'columns' and the wildcard can appear for queries such as:
     select * from dfs.`multilevel/csv`
     where columns[1] < 1000
  • The query can select specific elements such as `columns`[2]. In this case, only array elements can appear, not the unindexed `columns` column.
  • If is possible for `columns` to appear twice. In this case, the project operator will make a copy.

To handle these cases, the general rule is: allow any number of wildcard or `columns` appearances in the input projection, but collapse them all down to a single occurrence of `columns` in the output projection. (Upstream code will prevent `columns` from appearing twice in its non-indexed form.)

It falls to this parser to detect a not-uncommon user error, a query such as the following:


 SELECT max(columns[1]) AS col1
 FROM cp.`textinput/input1.csv`
 WHERE col1 IS NOT NULL
 
In standard SQL, column aliases are not allowed in the WHERE clause. So, Drill will push two columns down to the scan operator: `columns`[1] and `col1`. This parser will detect the "extra" columns and must provide a message that helps the user identify the likely original problem.