Hive-to-Drill Data Type Mapping
Using Drill you can read tables created in Hive that use data types compatible with Drill. Drill currently does not support writing Hive tables. The map of SQL types and Hive types shows that several Hive types need to be cast to the supported SQL type in a Drill query:
- TINYINT and SMALLINT Cast these types to INTEGER.
- BINARY Cast the Hive BINARY type to VARCHAR.
Note
In the 1.7 release, Drill automatically converts the Hive CHAR data type to VARCHAR. You no longer need to cast the Hive CHAR data type to VARCHAR when querying Hive tables.
Map of SQL and Hive Types
Supported SQL Type | Hive Type | Description |
---|---|---|
BIGINT | BIGINT | 8-byte signed integer |
BOOLEAN | BOOLEAN | TRUE (1) or FALSE (0) |
VARCHAR | CHAR | Character string, fixed-length max 255 |
DATE | DATE | Years months and days in the form in the form YYYY-MM-DD |
DECIMAL* | DECIMAL | 38-digit precision |
FLOAT | FLOAT | 4-byte single precision floating point number |
DOUBLE | DOUBLE | 8-byte double precision floating point number |
INTEGER | INT, TINYINT, SMALLINT | 1-, 2-, or 4-byte signed integer |
INTERVAL | N/A | A day-time or year-month interval |
TIME | N/A | Hours minutes seconds 24-hour basis |
N/A | TIMESTAMP | Conventional UNIX Epoch timestamp. |
TIMESTAMP | TIMESTAMP | JDBC timestamp in yyyy-mm-dd hh:mm:ss format |
None | STRING | Binary string (16) |
VARCHAR | VARCHAR | Character string variable length |
VARBINARY | BINARY | Binary string |
* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the planner.enable_decimal_data_type
option to true
.
Unsupported Types
Drill does not support the following Hive types:
- LIST
- MAP
- STRUCT
- TIMESTAMP (Unix Epoch format)
- UNION
Currently, the Apache Hive version used by Drill does not support the Hive timestamp in Unix Epoch format. The workaround is to use the JDBC format for the timestamp, which Hive accepts and Drill uses. The type mapping example shows how to use the workaround as follows.
- The timestamp value appears in the example CSV file in JDBC format: 2015-03-25 01:23:15.
- Workaround: The Hive table defines column i in the CREATE EXTERNAL TABLE command as a timestamp column.
- The Drill extract function verifies that Drill interprets the timestamp correctly.
Type Mapping Example
This example demonstrates the mapping of Hive data types to Drill data types. Using a CSV that has the following contents, you create a Hive table having values of different supported types:
8223372036854775807,true,3.5,-1231.4,3.14,42,"SomeText",2015-03-25,2015-03-25 01:23:15
Example Assumptions
The example makes the following assumptions:
- The CSV resides in the following location in the Drill sandbox:
/mapr/demo.mapr.com/data/
- You enabled the DECIMAL data type in Drill.
Define an External Table in Hive
In Hive, you define an external table using the following query:
hive> CREATE EXTERNAL TABLE types_demo (
a bigint,
b boolean,
c DECIMAL(3, 2),
d double,
e float,
f INT,
g VARCHAR(64),
h date,
i timestamp
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/mapr/demo.mapr.com/data/mytypes.csv';
* In this release, Drill disables the DECIMAL data type, including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. To enable the DECIMAL type, set the planner.enable_decimal_data_type
option to true
.
You check that Hive mapped the data from the CSV to the typed values as as expected:
hive> SELECT * FROM types_demo;
OK
8223372036854775807 true 3.5 -1231.4 3.14 42 "SomeText" 2015-03-25 2015-03-25 01:23:15
Time taken: 0.524 seconds, Fetched: 1 row(s)
Connect Drill to Hive and Query the Data
Note
Drill 1.8 implements the IF EXISTS parameter for the DROP TABLE and DROP VIEW commands, making IF a reserved word in Drill. As a result, you must include backticks around the Hive \``IF`` conditional function when you use it in a query on Hive tables. Alternatively, you can use the CASE statement instead of the IF function.
In Drill, you use the Hive storage plugin. Using the Hive storage plugin connects Drill to the Hive metastore containing the data.
0: jdbc:drill:> USE hive;
|------|----------------------------------|
| ok | summary |
|------|----------------------------------|
| true | Default schema changed to 'hive' |
|------|----------------------------------|
1 row selected (0.067 seconds)
The data in the Hive table shows the expected values.
0: jdbc:drill:> SELECT * FROM hive.`types_demo`;
|---------------------|------|------|---------|------|----|------------|------------|-----------------------|
| a | b | c | d | e | f | g | h | i |
|---------------------|------|------|---------|------|----|------------|------------|-----------------------|
| 8223372036854775807 | true | 3.50 | -1231.4 | 3.14 | 42 | "SomeText" | 2015-03-25 | 2015-03-25 01:23:15.0 |
|---------------------|------|------|---------|------|----|------------|------------|-----------------------|
1 row selected (1.262 seconds)
To validate that Drill interprets the timestamp in column i correctly, use the extract function to extract part of the date:
0: jdbc:drill:> select extract(year from i) from hive.`types_demo`;
|--------|
| EXPR$0 |
|--------|
| 2015 |
|--------|
1 row selected (0.387 seconds)