String Manipulation

You can use the following string functions in Drill queries:

Function Return Type
BYTE_SUBSTR BINARY or VARCHAR
CHAR_LENGTH INTEGER
CONCAT VARCHAR
ILIKE BOOLEAN
INITCAP VARCHAR
ISNUMERIC INTEGER
LENGTH INTEGER
LIKE BOOLEAN
LOWER VARCHAR
LPAD VARCHAR
LTRIM VARCHAR
POSITION INTEGER
REGEXP_MATCHES BOOLEAN
REGEXP_REPLACE VARCHAR
REPEAT VARCHAR
REPLACE VARCHAR
REVERSE VARCHAR
RPAD VARCHAR
RTRIM VARCHAR
SPLIT INTEGER
SPLIT_PART VARCHAR
STRPOS INTEGER
SUBSTR VARCHAR
TOASCII VARCHAR
TRIM VARCHAR
UPPER VARCHAR

Note

As is the case for indexes througout SQL, indexes into character strings are 1-based.

BYTE_SUBSTR

Returns in binary format a substring of the input string.

BYTE_SUBSTR Syntax

BYTE_SUBSTR( string-expression, start  [, length [(string-expression)]] )

string-expression is the entire string, a column name having string values for example. start is a start position in the string. 1 is the first position. length is the number of characters to the right of the start position to include in the output expressed in either of the following ways:

  • As an integer. For example, 19 includes 19 characters to the right of the start position in the output.
  • AS length(string-expression). For example, length(my_string) includes the number of characters in my_string minus the number of the start position.

BYTE_SUBSTR Usage Notes

Combine the use of BYTE_SUBSTR and CONVERT_FROM to separate parts of a HBase composite key for example.

BYTE_SUBSTR Examples

A composite HBase row key consists of strings followed by a reverse timestamp (long). For example: AMZN_9223370655563575807. Use BYTE_SUBSTR and CONVERT_FROM to separate parts of a HBase composite key.

SELECT CONVERT_FROM(BYTE_SUBSTR(row_key,6,19),'UTF8') FROM root.`mydata` LIMIT 1;
|---------------------|
| EXPR$0              |
|---------------------|
| 9223370655563575807 |
|---------------------|
1 rows selected (0.271 seconds)

SELECT CONVERT_FROM(BYTE_SUBSTR(row_key,6,length(row_key)),'UTF8') FROM root.`mydata` LIMIT 1;
|---------------------|
|       EXPR$0        |
|---------------------|
| 9223370655563575807 |
|---------------------|
1 rows selected (0.271 seconds)

CHAR_LENGTH

Returns the number of characters in the input string.

CHAR_LENGTH Syntax

CHAR_LENGTH(string)

CHAR_LENGTH Usage Notes

You can use the alias CHARACTER_LENGTH.

CHAR_LENGTH Example

SELECT CHAR_LENGTH('Drill rocks') FROM (VALUES(1));

|------------|
|   EXPR$0   |
|------------|
| 11         |
|------------|
1 row selected (0.127 seconds)

CONCAT

Concatenates arguments.

CONCAT Syntax

CONCAT(string [, string [, ...] )

CONCAT Example

SELECT CONCAT('Drill', ' ', 1.0, ' ', 'release') FROM (VALUES(1));

|-------------------|
| EXPR$0            |
|-------------------|
| Drill 1.0 release |
|-------------------|
1 row selected (0.134 seconds)

Alternatively, you can use the string concatenation operation to concatenate strings.

ILIKE

Performs a case-insensitive comparison of the input string with a pattern and returns true in the case of a match.

ILIKE Syntax

Note that it is necessary to quote ILIKE function calls using backticks because ILIKE is also a SQL keyword.

`ILIKE`(string, pattern)

ILIKE Examples

SELECT `ILIKE`('abcde', 'ABC%') FROM (VALUES(1));
|--------|
| EXPR$0 |
|--------|
| true   |
|--------|
1 row selected (0.185 seconds)
SELECT `ILIKE`(last_name, 'Spence') FROM cp.`employee.json` limit 3;
|--------|
| EXPR$0 |
|--------|
| false  |
| false  |
| true   |
|--------|
3 rows selected (0.17 seconds)

INITCAP

Returns the string using initial caps.

INITCAP Syntax

INITCAP(string)

INITCAP Examples

SELECT INITCAP('apache drill release 1.0') FROM (VALUES(1));

|--------------------------|
| EXPR$0                   |
|--------------------------|
| Apache Drill Release 1.0 |
|--------------------------|
1 row selected (0.106 seconds)

ISNUMERIC

Returns a 1 if the string argument matches the regular expression [-+]?\d+(\.\d+)? otherwise a 0.

ISNUMERIC Syntax

ISNUMERIC(string)

ISNUMERIC Examples

SELECT ISNUMERIC('3.1415926');
|--------|
| EXPR$0 |
|--------|
| 1      |
|--------|
1 row selected (0.185 seconds)

LENGTH

Returns the number of characters in the string.

LENGTH Syntax

LENGTH( string [, encoding] )

LENGTH Example

SELECT LENGTH('apache drill release 1.0') FROM (VALUES(1));

|------------|
|   EXPR$0   |
|------------|
| 24         |
|------------|
1 row selected (0.127 seconds)

SELECT LENGTH(row_key, 'UTF8') FROM root.`students`;

|------------|
|   EXPR$0   |
|------------|
| 8          |
| 8          |
| 8          |
| 8          |
|------------|
4 rows selected (0.259 seconds)

LIKE

Performs a case-sensitive comparison of the input string with a pattern and returns true in the case of a match.

LIKE Syntax

LIKE(string, pattern)

LIKE Examples

SELECT LIKE('abcde', 'ABC%') FROM (VALUES(1));
|--------|
| EXPR$0 |
|--------|
| false  |
|--------|
1 row selected (0.185 seconds)
SELECT LIKE(last_name, 'Spence') FROM cp.`employee.json` limit 3;
|--------|
| EXPR$0 |
|--------|
| false  |
| false  |
| true   |
|--------|
3 rows selected (0.17 seconds)

Alternatively, you can use the like operator to compare a string with a pattern.

LOWER

Converts the characters in the input string to lowercase.

LOWER Syntax

LOWER (string)

LOWER Example

SELECT LOWER('Apache Drill') FROM (VALUES(1));

|---------------|
|    EXPR$0     |
|---------------|
| apache drill  |
|---------------|
1 row selected (0.103 seconds)

LPAD

Pads the string to the length specified by prepending the fill or a space. Truncates the string if it is longer than the specified length. .

LPAD Syntax

LPAD (string, length [, fill text])

LPAD Example

SELECT LPAD('Release 1.0', 27, 'of Apache Drill 1.0') FROM (VALUES(1));

|------------------------------|
|            EXPR$0            |
|------------------------------|
| of Apache Drill Release 1.0  |
|------------------------------|
1 row selected (0.132 seconds)

LTRIM

Removes any characters from the beginning of string1 that match the characters in string2.

LTRIM Syntax

LTRIM(string1, string2)

LTRIM Examples

SELECT LTRIM('Apache Drill', 'Apache ') FROM (VALUES(1));

|------------|
|   EXPR$0   |
|------------|
| Drill      |
|------------|
1 row selected (0.131 seconds)

SELECT LTRIM('A powerful tool Apache Drill', 'Apache ') FROM (VALUES(1));

|----------------------------|
|           EXPR$0           |
|----------------------------|
| owerful tool Apache Drill  |
|----------------------------|
1 row selected (0.1 seconds)

POSITION

Returns the location of the first occurrence of a substring of the input string, or 0 if the substring does not occur.

POSITION Syntax

POSITION('substring' in 'string')

POSITION Example

SELECT POSITION('c' in 'Apache Drill') FROM (VALUES(1));

|------------|
|   EXPR$0   |
|------------|
| 4          |
|------------|
1 row selected (0.12 seconds)

REGEXP_MATCHES

Matches a regexp pattern to a target string. Returns a boolean value: true if the value matches the regexp, false if the value does not match the regexp.

REGEXP_MATCHES Syntax

REGEXP_MATCHES(string_expression, pattern)

string_expression is the string to be matched.

pattern is the Java regular expression pattern.

Note

The regular expression . matches any character except a line terminator unless the [DOTALL flag](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#DOTALL) is specified. By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If [MULTILINE mode](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#MULTILINE) is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in [MULTILINE mode](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#MULTILINE) $ matches just before a line terminator or the end of the input sequence.

REGEXP_MATCHES Examples

Shows several POSIX metacharacters that return true for the given string expressions:

select regexp_matches('abc', 'abc|def') as a, regexp_matches('cat', '[hc]at$') as b,  regexp_matches('cat', '.at') as c, regexp_matches('cat', '[hc]at') as d, regexp_matches('cat', '[^b]at') as e, regexp_matches('cat', '^[hc]at') as f, regexp_matches('[a]', '\[.\]') as g, regexp_matches('sat', 's.*') as h, regexp_matches('sat','[^hc]at') as i, regexp_matches('hat', '[hc]?at') as j, regexp_matches('cchchat', '[hc]*at') as k, regexp_matches('chat', '[hc]+at') as l;

|------|------|------|------|------|------|------|------|------|------|------|------|
|  a   |  b   |  c   |  d   |  e   |  f   |  g   |  h   |  i   |  j   |  k   |  l   |
|------|------|------|------|------|------|------|------|------|------|------|------|
| true | true | true | true | true | true | true | true | true | true | true | true |
|------|------|------|------|------|------|------|------|------|------|------|------|

Shows case-sensitivity:

select regexp_matches('abc', 'A*.C');
|--------|
| EXPR$0 |
|--------|
| false  |
|--------|

select regexp_matches('abc', 'a*.c');
|--------|
| EXPR$0 |
|--------|
| true   |
|--------|

Shows how to turn on different pattern flags to enable case-insensitive matching, to permit whitespace and comments in pattern, etc. Complete list of flags you can find in Java Doc):

select regexp_matches('String with newline character
    ', '(?s).*');
    +--------+
    | EXPR$0 |
    +--------+
    | true   |
    +--------+

    select regexp_matches('ABC', '(?i)abc');
    +--------+
    | EXPR$0 |
    +--------+
    | true   |
    +--------+

select regexp_matches('abc', '(?x)a    b    c');
    +--------+
    | EXPR$0 |
    +--------+
    | true   |
    +--------+

REGEXP_REPLACE

Substitutes new text for substrings that match Java regular expression patterns.

REGEXP_REPLACE Syntax

REGEXP_REPLACE(source_char, pattern, replacement)

source is the character expression to be replaced.

pattern is the regular expression.

replacement is the string to substitute for the source.

REGEXP_REPLACE Usage Notes

Capturing groups may defined in pattern using parentheses (...) and referenced by number from replacement using $1, $2, etc.

REGEXP_REPLACE Examples

Replace a’s with b’s in this string.

SELECT REGEXP_REPLACE('abc, acd, ade, aef', 'a', 'b') FROM (VALUES(1));
|---------------------|
|       EXPR$0        |
|---------------------|
| bbc, bcd, bde, bef  |
|---------------------|
1 row selected (0.105 seconds)

Use the regular expression a followed by a period (.) in the same query to replace all a’s and the subsequent character.

SELECT REGEXP_REPLACE('abc, acd, ade, aef', 'a.','b') FROM (VALUES(1));
|----------------|
| EXPR$0         |
|----------------|
| bc, bd, be, bf |
|----------------|
1 row selected (0.113 seconds)

Extract the text inside double quotes using a capturing group.

SELECT REGEXP_REPLACE('The quick "brown fox" jumps over...', '.*"(.*)".*', '$1') FROM (VALUES(1));
|-----------|
| EXPR$0    |
|-----------|
| brown fox |
|-----------|

REPEAT

Returns the input string repeated the specified number of times.

REPEAT Syntax

REPEAT(string, n_times)

REPEAT Examples

SELECT REPEAT('Drill', 3);
|-----------------|
| EXPR$0          |
|-----------------|
| DrillDrillDrill |
|-----------------|

REPLACE

Replaces all occurrences of the specified substring with another specified substring.

REPLACE Syntax

REPLACE(string, from, to)

REPLACE Examples

SELECT REPLACE('The quick brown fox jumps...', 'jumps', 'sleeps')
|-------------------------------|
| EXPR$0                        |
|-------------------------------|
| The quick brown fox sleeps... |
|-------------------------------|

REVERSE

Returns the reverse of the input string.

REVERSE Syntax

REVERSE (string)

REVERSE Example

SELECT REVERSE('abcdefg');
|---------|
| EXPR$0  |
|---------|
| gfedcba |
|---------|

RPAD

Pads the string to the length specified. Appends the text you specify after the fill keyword using spaces for the fill if you provide no text or insufficient text to achieve the length. Truncates the string if it is longer than the specified length.

RPAD Syntax

RPAD (string, length [, fill text])

RPAD Example

SELECT RPAD('Apache Drill ', 22, 'Release 1.0') FROM (VALUES(1));
|------------------------|
| EXPR$0                 |
|------------------------|
| Apache Drill Release 1 |
|------------------------|
1 row selected (0.107 seconds)

RTRIM

Removes any characters from the end of string1 that match the characters in string2.

RTRIM Syntax

RTRIM(string1, string2)

RTRIM Examples

SELECT RTRIM('Apache Drill', 'Drill ') FROM (VALUES(1));

|--------|
| EXPR$0 |
|--------|
| Apache |
|--------|
1 row selected (0.135 seconds)

SELECT RTRIM('1.0 Apache Tomcat 1.0', 'Drill 1.0') from (VALUES(1));
|--------------------|
|       EXPR$0       |
|--------------------|
| 1.0 Apache Tomcat  |
|--------------------|
1 row selected (0.102 seconds)

SPLIT

Splits the input string into a list of substrings using the specified delimiter character.

SPLIT Syntax

SPLIT(string, delimiter)

SPLIT Usage Notes

The delimiter must not be null and must contain a single character.

SPLIT Examples

SELECT split('The quick brown fox jumps...', ' ');

|------------------------------------|
| EXPR$0                             |
|------------------------------------|
| [The, quick, brown, fox, jumps...] |
|------------------------------------|

SPLIT_PART

Return the string part at start or from start to end after splitting the input string using the specified delimiter.

SPLIT_PART Syntax

SPLIT_PART(string, delimiter, start[, end])

SPLIT_PART Usage Notes

The delimiter string may be multiple characters long. The start can be a positive integer, or a negative integer. The end must have the same sign as and be greater than or equal to start if provided. The start and end count from beginning if are positive, from end if are negative.

SPLIT_PART Examples

SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4);

|--------|
| EXPR$0 |
|--------|
| fox    |
|--------|

SELECT split_part('The | quick | brown | fox | jumps', ' | ', -2);

|--------|
| EXPR$0 |
|--------|
| fox    |
|--------|

SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4, 5);

|--------------|
|    EXPR$0    |
|--------------|
| fox | jumps  |
|--------------|

SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4, 10);

|--------------|
|    EXPR$0    |
|--------------|
| fox | jumps  |
|--------------|

SELECT split_part('The | quick | brown | fox | jumps', ' | ', -2, -1);

|--------------|
|    EXPR$0    |
|--------------|
| fox | jumps  |
|--------------|

SELECT split_part('The | quick | brown | fox | jumps', ' | ', -10, -4);

|--------------|
|    EXPR$0    |
|--------------|
| The | quick  |
|--------------|

STRPOS

Returns the location of the first occurrence of a substring of the input string, or 0 if the substring does not occur.

STRPOS Syntax

STRPOS(string, substring)

STRPOS Example

SELECT STRPOS('Apache Drill', 'Drill') FROM (VALUES(1));

|--------|
| EXPR$0 |
|--------|
| 8      |
|--------|
1 row selected (0.22 seconds)

SUBSTR

Returns

  • the substring of the input string starting the specified location and optionally having the specified length or
  • the first substring of the input string matching the specified regular expression.

SUBSTR Syntax

SUBSTR(string, start)
SUBSTR(string, start, length)
SUBSTR(string, regexp)

SUBSTR Usage Notes

  1. If start specifies a location outside of string or if length < 1 then the empty string '' is returned.
  2. If length is unspecified or if it specifies a substring that extends beyond the end of string then the returned substring extends to the end of the string.
  3. If regexp does not match a substring of string then NULL is returned.
  4. You can use the alias SUBSTRING for this function.

SUBSTR Example

SELECT SUBSTR('Apache Drill', 8) FROM (VALUES(1));

|--------|
| EXPR$0 |
|--------|
| Drill  |
|--------|
1 row selected (0.134 seconds)

SELECT SUBSTR('Apache Drill', 3, 2) FROM (VALUES(1));

|--------|
| EXPR$0 |
|--------|
| ac     |
|--------|
1 row selected (0.129 seconds)

SELECT SUBSTR('On 1970-01-01 the quick brown fox jumped...', '[\d-]+')

|------------|
| EXPR$0     |
|------------|
| 1970-01-01 |
|------------|

TRIM

Removes any characters from the beginning, end, or both sides of string2 that match the characters in string1.

TRIM Syntax

TRIM ([leading | trailing | both] [string1] from string2)

TRIM Example

SELECT TRIM(trailing 'l' from 'Drill') FROM (VALUES(1));
|------------|
|   EXPR$0   |
|------------|
| Dri        |
|------------|
1 row selected (0.172 seconds)

SELECT TRIM(both 'l' from 'long live Drill') FROM (VALUES(1));
|---------------|
|    EXPR$0     |
|---------------|
| ong live Dri  |
|---------------|
1 row selected (0.104 seconds)

SELECT TRIM(leading 'l' from 'long live Drill') FROM (VALUES(1));
|-----------------|
|     EXPR$0      |
|-----------------|
| ong live Drill  |
|-----------------|
1 row selected (0.101 seconds)

UPPER

Converts the characters in the input string to uppercase.

UPPER Syntax

UPPER (string)

UPPER Example

SELECT UPPER('Apache Drill') FROM (VALUES(1));

|--------------|
| EXPR$0       |
|--------------|
| APACHE DRILL |
|--------------|
1 row selected (0.081 seconds)