Querying Sequence Files

Feb 9, 2018

Sequence files are flat files that store binary key value pairs. Drill projects sequence files as a table with two columns 'binary_key', 'binary_value'.

Querying a Sequence File

Start the Drill shell and enter your query.

    SELECT * FROM dfs.tmp.`simple.seq` LIMIT 1;
    +--------------+---------------+
    |  binary_key  | binary_value  |
    +--------------+---------------+
    | [B@70828f46  | [B@b8c765f    |
    +--------------+---------------+

Since simple.seq contains byte serialized strings as keys and values, you can convert them to strings.

    SELECT CONVERT_FROM(binary_key, 'UTF8'), CONVERT_FROM(binary_value, 'UTF8') FROM dfs.tmp.`simple.seq` LIMIT 1;
    +-----------+-------------+
    |  EXPR$0   |   EXPR$1    |
    +-----------+-------------+
    | key0      |   value0    |
    +-----------+-------------+