Supported Data Types

Drill reads from and writes to data sources having a wide variety of types.

SQL Data Type Description Example
BIGINT 8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 9223372036854775807
BINARY Variable-length byte string B@e6d9eb7
BOOLEAN True or false true
DATE Years, months, and days in YYYY-MM-DD format since 4713 BC 2015-12-30
DECIMAL(p,s), or DEC(p,s), NUMERIC(p,s)* 38-digit precision number, precision is p, and scale is s DECIMAL(6,2) is 1234.56, 4 digits before and 2 digits after the decimal point
FLOAT 4-byte floating point number 0.456
DOUBLE, DOUBLE PRECISION 8-byte floating point number, precision-scalable 0.456
INTEGER or INT 4-byte signed integer in the range -2,147,483,648 to 2,147,483,647 2147483646
INTERVAL** A day-time or year-month interval '1 10:20:30.123' (day-time) or '1-2' year to month (year-month)
SMALLINT*** 2-byte signed integer in the range -32,768 to 32,767 32000
TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss 22:55:55.23
TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS 2015-12-30 22:55:55.23
CHARACTER VARYING, CHARACTER, CHAR,**** or VARCHAR UTF8-encoded variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647. CHAR(30) casts data to a 30-character string maximum.

* In this release, Drill disables the DECIMAL data type (an alpha feature), including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. The NUMERIC data type is an alias for the DECIMAL data type.
** Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.
*** SMALLINT is not currently supported.
**** Currently, Drill supports only variable-length strings.

Enabling the DECIMAL Type

To enable the DECIMAL type, set the planner.enable_decimal_data_type option to true. The DECIMAL type is released as an alpha feature and not recommended for production use.

 ALTER SYSTEM SET `planner.enable_decimal_data_type` = true;

+-------+--------------------------------------------+
|  ok   |                  summary                   |
+-------+--------------------------------------------+
| true  | planner.enable_decimal_data_type updated.  |
+-------+--------------------------------------------+
1 row selected (0.08 seconds)

Disabling the DECIMAL Type

By default, Drill disables the DECIMAL data type (an alpha feature), including casting to DECIMAL and reading DECIMAL types from Parquet and Hive. If you enabled the DECIMAL type by setting planner.enable_decimal_data_type = true, set the option to false to disable DECIMAL.

When the DECIMAL type is disabled, you might see DECIMAL in the query plan, but Drill internally converts DECIMAL to NUMERIC.

Composite Types

Drill supports the following composite types:

  • Map
  • Array

A map is a set of name/value pairs. A value in a map can be a scalar type, such as string or int, or a complex type, such as an array or another map. An array is a repeated list of values. A value in an array can be a scalar type, such as string or int, or an array can be a complex type, such as a map or another array.

Drill uses map and array data types internally for reading complex and nested data structures from data sources. In this release of Drill, you cannot reference a composite type by name in a query, but Drill supports array values coming from data sources. For example, you can use the index syntax to query data and get the value of an array element:

a[1]

You can refer to the value for a key in a map using dot notation:

t.m.k

The section “Query Complex Data” shows how to use composite types to access nested arrays. "Handling Different Data Types" includes examples of JSON maps and arrays. Drill provides functions for handling array and map types:

ANY Type

The ANY type is a key technological advance in Drill that enables it to address late typing problems. Drill uses the ANY type internally and you might see references to ANY in the output of the DESCRIBE or other commands. You cannot cast a value to the ANY type in this release.

Using the ANY type, the parser postpones the problem of resolving the type of some value until the query is actually running. At that point, Drill has an empirical schema available for each record batch to use for final code generation and optimization. If the empirical schema changes due to changes in the data processing, Drill regenerates the code as necessary.

Casting and Converting Data Types

In Drill, you cast or convert data to the required type for moving data from one data source to another. You do not assign a data type to every column name in a CREATE TABLE statement to define the table as you do in database software. Instead, you use the CREATE TABLE AS (CTAS) statement with one or more of the following functions to define the table:

In some cases, Drill converts schema-less data to correctly-typed data implicitly. In this case, you do not need to cast the data. The file format of the data and the nature of your query determines the requirement for casting or converting. Differences in casting depend on the data source. The following list describes how Drill treats data types from various data sources:

  • HBase
    Does not implicitly cast input to SQL types. Convert data to appropriate types as as described in the section "Querying HBase". Use CONVERT_TO or CONVERT_FROM data types.
  • Hive
    Implicitly casts Hive types to SQL types as shown in the Hive type mapping example
  • JSON
    Implicitly casts JSON data to its corresponding SQL types or to VARCHAR if Drill is in all text mode.
  • MapR-DB
    Implicitly casts MapR-DB data to SQL types when you use the maprdb format for reading MapR-DB data. The dfs storage plugin defines the format when you install Drill from the mapr-drill package on a MapR node.
  • Parquet
    Implicitly casts Parquet data to the SQL types shown in SQL Data Types to Parquet.
  • Text: CSV, TSV, and other text
    Implicitly casts all textual data to VARCHAR.

Implicit Casting Precedence of Data Types

The following list includes data types Drill uses in descending order of precedence. Casting precedence shown in the following table applies to the implicit casting that Drill performs. For example, Drill might implicitly cast data when a query includes a function or filter on mismatched data types:

SELECT myBigInt FROM mytable WHERE myBigInt = 2.5;

As shown in the table, Drill can cast a NULL value, which has the lowest precedence, to any other type; you can cast a SMALLINT (not supported in this release) value to INT. Drill might deviate from these precedence rules for performance reasons. Under certain circumstances, such as queries involving SUBSTR and CONCAT functions, Drill reverses the order of precedence and allows a cast to VARCHAR from a type of higher precedence than VARCHAR, such as BIGINT. The INTERVALDAY and INTERVALYEAR types are internal types.

Casting Precedence

Precedence Data Type Precedence Data Type
1 INTERVALYEAR (highest) 12 UINT2
2 INTERVALDAY 13 SMALLINT*
3 TIMESTAMP 14 UINT1
4 DATE 15 VAR16CHAR
5 TIME 16 FIXED16CHAR
6 DOUBLE 17 VARCHAR
7 DECIMAL 18 CHAR
8 UINT8 19 VARBINARY
9 BIGINT 20 FIXEDBINARY
10 UINT4 21 NULL (lowest)
11 INT

* Not supported in this release.

Explicit Casting

In a textual file, such as CSV, Drill interprets every field as a VARCHAR, as previously mentioned. To handle textual data, you can use the following functions to cast and convert compatible data types:

  • CAST
    Casts data from one data type to another.
  • CONVERT_TO and CONVERT_FROM functions Converts data, including binary data, from one data type to another using "CONVERT_TO and CONVERT_FROM data types"
  • TO_CHAR
    Converts a TIMESTAMP, INTERVALDAY/INTERVALYEAR, INTEGER, DOUBLE, or DECIMAL to a string.
  • TO_DATE
    Converts a string to DATE.
  • TO_NUMBER
    Converts a string to a DECIMAL.
  • TO_TIMESTAMP
    Converts a string to TIMESTAMP.

If the SELECT statement includes a WHERE clause that compares a column of an unknown data type, cast both the value of the column and the comparison value in the WHERE clause.

Explicit Type Casting Maps

The following tables show data types that Drill can cast to/from other data types. Not all types are available for explicit casting in the current release.

Numerical and Character Data Types

To SMALLINT INT BIGINT DECIMAL FLOAT CHAR FIXEDBINARY VARCHAR VARBINARY
From:
SMALLINT* yes yes yes yes yes yes yes yes yes
INT yes yes yes yes yes yes yes yes yes
BIGINT yes yes yes yes yes yes yes yes yes
DECIMAL yes yes yes yes yes yes yes yes yes
DOUBLE yes yes yes yes yes yes no yes no
FLOAT yes yes yes yes yes yes no yes no
CHAR yes yes yes yes yes no yes yes yes
FIXEDBINARY** yes yes yes yes yes no no yes yes
VARCHAR*** yes yes yes yes yes yes yes no yes
VARBINARY** yes yes yes yes yes no yes yes no

* Not supported in this release.
** Used to cast binary UTF-8 data coming to/from sources such as HBase.
*** You cannot convert a character string having a decimal point to an INT or BIGINT.

Note

The CAST function does not support all representations of FIXEDBINARY and VARBINARY. Only the UTF-8 format is supported.

If your FIXEDBINARY or VARBINARY data is in a format other than UTF-8, or big-endian encoded, use the CONVERT_TO/FROM functions instead of CAST.

Date and Time Data Types

To: DATE TIME TIMESTAMP INTERVALYEAR INTERVALDAY
From:
CHAR Yes Yes Yes Yes Yes
FIXEDBINARY* No No No No No
VARCHAR Yes Yes Yes Yes Yes
VARBINARY* No No Yes No No
DATE No No Yes No No
TIME No Yes Yes No No
TIMESTAMP Yes Yes Yes No No
INTERVALYEAR Yes No Yes No Yes
INTERVALDAY Yes No Yes Yes No

* Used to cast binary UTF-8 data coming to/from sources such as HBase. The CAST function does not support all representations of FIXEDBINARY and VARBINARY. Only the UTF-8 format is supported.

Data Types for CONVERT_TO and CONVERT_FROM Functions

The CONVERT_TO function converts data to bytes from the input type. The CONVERT_FROM function converts data from bytes to the input type. For example, the following CONVERT_TO function converts an integer to bytes using big endian encoding:

CONVERT_TO(mycolumn, 'INT_BE')

The following table lists the data types for use with the CONVERT_TO and CONVERT_FROM functions:

Type Input Type Output Type
JSON bytes varchar
BOOLEAN_BYTE bytes(1) BOOLEAN
TINYINT_BE bytes(1) TINYINT
TINYINT bytes(1) TINYINT
SMALLINT_BE bytes(2) SMALLINT
SMALLINT bytes(2) SMALLINT
INT_BE bytes(4) INT
INT bytes(4) INT
BIGINT_BE bytes(8) BIGINT
BIGINT bytes(8) BIGINT
FLOAT bytes(4) FLOAT (float4)
DOUBLE bytes(8) DOUBLE (float8)
INT_HADOOPV bytes(1-9) INT
BIGINT_HADOOPV BYTES(1-9) BIGINT
DATE_EPOCH_BE bytes(8) DATE
DATE_EPOCH bytes(8) DATE
TIME_EPOCH_BE bytes(8) TIME
TIME_EPOCH bytes(8) TIME
TIMESTAMP_EPOCH bytes(8) TIMESTAMP
TIMESTAMP_IMPALA* bytes(12) TIMESTAMP
UTF8 bytes VARCHAR
UTF16 bytes VAR16CHAR
UINT8 bytes(8) UINT8
UINT8_BE bytes(8) UINT8

* In Drill 1.2 and later, use the TIMESTAMP_IMPALA type with the CONVERT_FROM function to decode a timestamp from Hive or Impala, as shown in the section, "About INT96 Support".

This table includes types such as INT, for converting little endian-encoded data and types such as INT_BE for converting big endian-encoded data to Drill internal types. You need to convert binary representations, such as data in HBase, to a Drill internal format as you query the data. If you are unsure that the size of the source and destination INT or BIGINT you are converting is the same, use CAST to convert these data types to/from binary.

*_HADOOPV in the data type name denotes the variable length integer as defined by Hadoop libraries. Use a *_HADOOPV type if user data is encoded in this format by a Hadoop tool outside MapR.