You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xiaobing Zhou (JIRA)" <ji...@apache.org> on 2014/07/25 01:33:38 UTC
[jira] [Updated] (HIVE-7511) Hive: output is incorrect if there are
UTF-8 characters in where clause of a hive select query.
[ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaobing Zhou updated HIVE-7511:
--------------------------------
Attachment: HIVE-7511.1.patch
> Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
> -----------------------------------------------------------------------------------------------
>
> Key: HIVE-7511
> URL: https://issues.apache.org/jira/browse/HIVE-7511
> Project: Hive
> Issue Type: Bug
> Affects Versions: 0.13.0
> Environment: Windows Server 2008 R2
> Reporter: Xiaobing Zhou
> Assignee: Xiaobing Zhou
> Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7511.1.patch
>
>
> When we put UTF-8 characters in where clause of a hive query the results are empty for "where content like '%丄%'" and results contain all rows for "where content not like '%丄%';" even when few rows contain this character.
> Steps to reproduce:
> 1. Save a file called data.txt in the root container. The contents of the files are as follows.
> 190 丄f齄啊c狛䶴h䶴c狝
> 899 d狜狜㐁geg阿狚ea䶴eead狜e
> 137 齄鼾h狝ge㐀狛g狚阿
> 21 﨩﨩e㐀c狛鼾d䶴﨨
> 767 﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
> 281 﨨㐀啊aga啊c狝e鼾鼾
> 573 㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
> 966 䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
> 565 䶵㐀﨩㐀bb狛ehd丄ea丄㐀
> 778 﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
> 363 gd齄a鼾a䶴b㐁㐁fg鼾
> 822 a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
> 338 b齄㐁ff阿e狜e㐀ba齄
> 2. Execute the following queries to setup the table.
> a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '
> t' LOCATION '/hivetable';
> b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
> 3. create a query file query.hql with following contents
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content like '%丄%';
> 4. even though few rows contains this character the output is empty.
> 5. change the contents of query.hql to
> INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
> select * from hivetable where content not like '%丄%';
> 6. The output contains all rows including those containing the given character.
> 7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝'; "
> 8. We get expected results when using "where content like '%a%'; "
--
This message was sent by Atlassian JIRA
(v6.2#6252)