You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Arina Ielchiieva (JIRA)" <ji...@apache.org> on 2017/05/05 15:26:04 UTC

[jira] [Created] (DRILL-5477) String functions (lower, upper, initcap) should work for UTF-8

Arina Ielchiieva created DRILL-5477:
---------------------------------------

             Summary: String functions (lower, upper, initcap) should work for UTF-8
                 Key: DRILL-5477
                 URL: https://issues.apache.org/jira/browse/DRILL-5477
             Project: Apache Drill
          Issue Type: Improvement
          Components: Functions - Drill
    Affects Versions: 1.10.0
            Reporter: Arina Ielchiieva


Drill string functions lower / upper / initcap work only for ASCII, but not for UTF-8. UTF-8 is a multi-byte code that requires special encoding/decoding to convert to Unicode characters. Without that encoding, these functions won't work for Cyrillic, Greek or any other character set with upper/lower distinctions.

Currently, when user applies these functions for UTF-8, Drill returns the same value as was given.
Example:
{noformat}
select upper('привет') from (values(1)) -> привет
{noformat}

There is disabled unit test in https://github.com/arina-ielchiieva/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/expr/fn/impl/TestStringFunctions.java#L33 which should be enabled once issue is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)