You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2014/07/25 02:48:38 UTC

[jira] [Commented] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.

    [ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073908#comment-14073908 ] 

Navis commented on HIVE-7511:
-----------------------------

[~xiaobingo] "Closed" means the patch is applied to trunk, which is not. Just use "Submit Patch" button below the summary. And for the patch, I think we should accept encoding type for the script file from user or hive-site.xml, not enforcing to use UTF-8.

> Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7511
>                 URL: https://issues.apache.org/jira/browse/HIVE-7511
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>         Environment: Windows Server 2008 R2
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>            Priority: Critical
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7511.1.patch
>
>
> When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%丄%'" and results contain all rows for "where content not like '%丄%';" even when few rows contain this character.
> Steps to reproduce:
> 1. Save a file called data.txt in the root container. The contents of the files are as follows.
> 190	丄f齄啊c狛䶴h䶴c狝
> 899	d狜狜㐁geg阿狚ea䶴eead狜e
> 137	齄鼾h狝ge㐀狛g狚阿
> 21	﨩﨩e㐀c狛鼾d䶴﨨
> 767	﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
> 281	﨨㐀啊aga啊c狝e鼾鼾
> 573	㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
> 966	䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
> 565	䶵㐀﨩㐀bb狛ehd丄ea丄㐀
> 778	﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
> 363	gd齄a鼾a䶴b㐁㐁fg鼾
> 822	a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
> 338	b齄㐁ff阿e狜e㐀ba齄
> 2. Execute the following queries to setup the table.
> a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '
> t' LOCATION '/hivetable';
> b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
> 3. create a query file query.hql with following contents
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content like '%丄%';
> 4. even though few rows contains this character the output is empty.
> 5. change the contents of query.hql to 
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content not like '%丄%';
> 6. The output contains all rows including those containing the given character.
> 7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝'; "
> 8. We get expected results when using "where content like '%a%'; "



--
This message was sent by Atlassian JIRA
(v6.2#6252)