You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Dong Bo 董博 <do...@fosun.com> on 2017/12/18 06:38:09 UTC

English Mixed Chinese Substring

Hi Forks,

Impala treats Chinese word as 3 letters , for English word 1 letter, Is there any settings to make it easy to substring a mixed String just like what hive does?

Eg :  select substr('test测试test', 1 ,5 ) ,
returns :  test测  ,  NOT : test�


Thanks
Carl

Re: English Mixed Chinese Substring

Posted by Jeszy <je...@gmail.com>.
Hello Carl,

There's no UTF-8 support in Impala yet, but you could write your own
UDF to handle it (or contribute a patch).

Regards

On 18 December 2017 at 07:38, Dong Bo 董博 <do...@fosun.com> wrote:
> Hi Forks,
>
>
>
> Impala treats Chinese word as 3 letters , for English word 1 letter, Is
> there any settings to make it easy to substring a mixed String just like
> what hive does?
>
>
>
> Eg :  select substr('test测试test', 1 ,5 ) ,
>
> returns :  test测  ,  NOT : test�
>
>
>
>
>
> Thanks
>
> Carl