You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xiaobing Zhou (JIRA)" <ji...@apache.org> on 2014/07/25 01:19:39 UTC

[jira] [Created] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.

Xiaobing Zhou created HIVE-7511:
-----------------------------------

             Summary: Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
                 Key: HIVE-7511
                 URL: https://issues.apache.org/jira/browse/HIVE-7511
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.13.0
         Environment: Windows Server 2008 R2
            Reporter: Xiaobing Zhou
            Assignee: Xiaobing Zhou
            Priority: Critical
             Fix For: 0.14.0


When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%丄%'" and results contain all rows for "where content not like '%丄%';" even when few rows contain this character.

Steps to reproduce:

1. Save a file called data.txt in the root container. The contents of the files are as follows.

190	丄f齄啊c狛䶴h䶴c狝
899	d狜狜㐁geg阿狚ea䶴eead狜e
137	齄鼾h狝ge㐀狛g狚阿
21	﨩﨩e㐀c狛鼾d䶴﨨
767	﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
281	﨨㐀啊aga啊c狝e鼾鼾
573	㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
966	䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
565	䶵㐀﨩㐀bb狛ehd丄ea丄㐀
778	﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
363	gd齄a鼾a䶴b㐁㐁fg鼾
822	a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
338	b齄㐁ff阿e狜e㐀ba齄

2. Execute the following queries to setup the table.
a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '
t' LOCATION '/hivetable';
b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;

3. create a query file query.hql with following contents

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content like '%丄%';

4. even though few rows contains this character the output is empty.

5. change the contents of query.hql to 

INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
select * from hivetable where content not like '%丄%';

6. The output contains all rows including those containing the given character.

7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝'; "

8. We get expected results when using "where content like '%a%'; "



--
This message was sent by Atlassian JIRA
(v6.2#6252)