You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Naveen Gangam (JIRA)" <ji...@apache.org> on 2018/08/06 05:41:00 UTC
[jira] [Commented] (HIVE-7248) UNION ALL in hive returns incorrect
results on Hbase backed table
[ https://issues.apache.org/jira/browse/HIVE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16569758#comment-16569758 ]
Naveen Gangam commented on HIVE-7248:
-------------------------------------
[~navis] This still seems to be an issue (https://issues.apache.org/jira/browse/HIVE-20206) . Was there a reason this was never committed to the source? if there isnt, should we revive this issue and rebase the patch? Thanks
> UNION ALL in hive returns incorrect results on Hbase backed table
> -----------------------------------------------------------------
>
> Key: HIVE-7248
> URL: https://issues.apache.org/jira/browse/HIVE-7248
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 0.12.0, 0.13.0, 0.13.1
> Reporter: Mala Chikka Kempanna
> Assignee: Navis
> Priority: Major
> Attachments: HIVE-7248.1.patch.txt, HIVE-7248.2.patch.txt, HIVE-7248.3.patch.txt
>
>
> The issue can be recreated with following steps
> 1) In hbase
> create 'TABLE_EMP','default'
> 2) On hive
> sudo -u hive hive
> CREATE EXTERNAL TABLE TABLE_EMP(FIRST_NAME string,LAST_NAME string,CDS_UPDATED_DATE string,CDS_PK string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = "default:FIRST_NAME,default:LAST_NAME,default:CDS_UPDATED_DATE,:key", "hbase.scan.cache" = "500", "hbase.scan.cacheblocks" = "false" ) TBLPROPERTIES("hbase.table.name" = "TABLE_EMP",'serialization.null.format'='');
> 3) On hbase insert the following data
> put 'TABLE_EMP', '1', 'default:FIRST_NAME', 'Srini'
> put 'TABLE_EMP', '1', 'default:LAST_NAME', 'P'
> put 'TABLE_EMP', '1', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
> put 'TABLE_EMP', '2', 'default:FIRST_NAME', 'Aravind'
> put 'TABLE_EMP', '2', 'default:LAST_NAME', 'K'
> put 'TABLE_EMP', '2', 'default:CDS_UPDATED_DATE', '2014-06-16 00:00:00'
> 4) On hive execute the following query
> hive
> SELECT *
> FROM (
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= '0'
> AND CDS_PK <= '9'
> AND CDS_UPDATED_DATE IS NOT NULL
> UNION ALL SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= 'a'
> AND CDS_PK <= 'z'
> AND CDS_UPDATED_DATE IS NOT NULL
> )t ;
> 5) Output of the query
> 1
> 1
> 2
> 2
> 6) Output of just
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= '0'
> AND CDS_PK <= '9'
> AND CDS_UPDATED_DATE IS NOT NULL
> is
> 1
> 2
> 7) Output of just
> SELECT CDS_PK
> FROM TABLE_EMP
> WHERE
> CDS_PK >= 'a'
> AND CDS_PK <= 'z'
> AND CDS_UPDATED_DATE IS NOT NULL
> Empty
> 8) UNION is used to combine the result from multiple SELECT statements into a single result set. Hive currently only supports UNION ALL (bag union), in which duplicates are not eliminated
> Accordingly above query should return output
> 1
> 2
> instead it is giving wrong output
> 1
> 1
> 2
> 2
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)