You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Philip (JIRA)" <ji...@apache.org> on 2019/08/02 12:47:00 UTC

[jira] [Commented] (IMPALA-2019) Proper UTF-8 support in string functions

    [ https://issues.apache.org/jira/browse/IMPALA-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898861#comment-16898861 ] 

Philip commented on IMPALA-2019:
--------------------------------

Also String lengths seem to be an issue.

It appears to return the *byte length* rather than the *number of characters*.

I would suggest this is +not a minor issue+.

 

{color:#205081}   *select length('€')* {color}

 

In Hive returns 1

In Impala returns 3

 

> Proper UTF-8 support in string functions
> ----------------------------------------
>
>                 Key: IMPALA-2019
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2019
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: Impala 2.1, Impala 2.2
>            Reporter: Andrés Cordero
>            Priority: Minor
>              Labels: sql-language
>
> As documented here: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_string.html
> Impala does not properly handle non-ASCII UTF-8 characters, and will return results in string functions such as length that are inconsistent with Hive.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org