Sequence Files

Hadoop Sequence files (https://wiki.apache.org/hadoop/SequenceFile) are flat files storing binary key, value pairs. Drill projects sequence files as table with two columns - ‘binary_key’, ‘binary_value’ of type VARBINARY.

Storage Plugin Format for Sequence Files

. . .
"sequencefile": {
  "type": "sequencefile",
  "extensions": [
    "seq"
  ]
},
. . .

Querying a Sequence File

SELECT *
FROM dfs.tmp.`simple.seq`
LIMIT 1;
|--------------|---------------|
|  binary_key  | binary_value  |
|--------------|---------------|
| [B@70828f46  | [B@b8c765f    |
|--------------|---------------|

simple.seq contains byte serialized strings as keys and values, we can convert them to strings.

SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8')
FROM dfs.tmp.`simple.seq`
LIMIT 1
;
|-----------|-------------|
|  EXPR$0   |   EXPR$1    |
|-----------|-------------|
| key0      |   value0    |
|-----------|-------------|