You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Li Xin (JIRA)" <ji...@apache.org> on 2015/02/09 04:42:34 UTC

[jira] [Created] (HIVE-9613) Left join query plan outputs wrong column when using subquery

Li Xin created HIVE-9613:
----------------------------

             Summary: Left join query plan outputs  wrong column when using subquery
                 Key: HIVE-9613
                 URL: https://issues.apache.org/jira/browse/HIVE-9613
             Project: Hive
          Issue Type: Bug
          Components: Parser, Query Planning
    Affects Versions: 0.14.0, 1.0.0
         Environment: apache hadoop 2.5.1 
            Reporter: Li Xin


I have a query that outputs a column with wrong contents,which equals to another column.  

I have three tables,as follows:

table 1: _hivetemp.category_city_rank_:
||category||city||rank||
|jinrongfuwu|shanghai|1|
|ktvjiuba|shanghai|2|

table 2:_hivetemp.category_match_:
||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
|danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
|zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|

table 3:_hivetemp.city_match_:
||src_city_name_en||dst_city_name_en||city_name_cn||
|sh|shanghai|上海|

And the query is :
{code}
select
    a.category,
    a.city,
    a.rank,
    b.src_category_en,
    c.src_city_name_en
from
    hivetemp.category_city_rank a
left outer join
(select
    src_category_en,
    dst_category_en
from
    hivetemp.category_match) b
on  a.category = b.dst_category_en
left outer join
(select
    src_city_name_en,
    dst_city_name_en
from
    hivetemp.city_match) c
on  a.city = c.dst_city_name_en
{code}

which shoud output the results as follows,and i test it in hive 0.13:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
|ktvjiuba|shanghai|2|zpwentiyingshi|sh|

but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|shanghai|sh|
|ktvjiuba|shanghai|2|shanghai|sh|

Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing.

{quote}
           b:category_match
          TableScan
            alias: category_match
            Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: dst_category_en (type: string)
              outputColumnNames: _col1
              Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE

{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)