You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Li Xin (JIRA)" <ji...@apache.org> on 2015/02/09 04:46:34 UTC
[jira] [Updated] (HIVE-9613) Left join query plan outputs wrong
column when using subquery
[ https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Xin updated HIVE-9613:
-------------------------
Description:
I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own.
I have three tables,as follows:
table 1: _hivetemp.category_city_rank_:
||category||city||rank||
|jinrongfuwu|shanghai|1|
|ktvjiuba|shanghai|2|
table 2:_hivetemp.category_match_:
||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
|danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
|zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
table 3:_hivetemp.city_match_:
||src_city_name_en||dst_city_name_en||city_name_cn||
|sh|shanghai|上海|
And the query is :
{code}
select
a.category,
a.city,
a.rank,
b.src_category_en,
c.src_city_name_en
from
hivetemp.category_city_rank a
left outer join
(select
src_category_en,
dst_category_en
from
hivetemp.category_match) b
on a.category = b.dst_category_en
left outer join
(select
src_city_name_en,
dst_city_name_en
from
hivetemp.city_match) c
on a.city = c.dst_city_name_en
{code}
which shoud output the results as follows,and i test it in hive 0.13:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
|ktvjiuba|shanghai|2|zpwentiyingshi|sh|
but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|shanghai|sh|
|ktvjiuba|shanghai|2|shanghai|sh|
Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing.
{quote}
b:category_match
TableScan
alias: category_match
Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: dst_category_en (type: string)
outputColumnNames: _col1
Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
{quote}
was:
I have a query that outputs a column with wrong contents,which equals to another column.
I have three tables,as follows:
table 1: _hivetemp.category_city_rank_:
||category||city||rank||
|jinrongfuwu|shanghai|1|
|ktvjiuba|shanghai|2|
table 2:_hivetemp.category_match_:
||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
|danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
|zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
table 3:_hivetemp.city_match_:
||src_city_name_en||dst_city_name_en||city_name_cn||
|sh|shanghai|上海|
And the query is :
{code}
select
a.category,
a.city,
a.rank,
b.src_category_en,
c.src_city_name_en
from
hivetemp.category_city_rank a
left outer join
(select
src_category_en,
dst_category_en
from
hivetemp.category_match) b
on a.category = b.dst_category_en
left outer join
(select
src_city_name_en,
dst_city_name_en
from
hivetemp.city_match) c
on a.city = c.dst_city_name_en
{code}
which shoud output the results as follows,and i test it in hive 0.13:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
|ktvjiuba|shanghai|2|zpwentiyingshi|sh|
but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents:
||category||city||rank||src_category_en||src_city_name_en||
|jinrongfuwu|shanghai|1|shanghai|sh|
|ktvjiuba|shanghai|2|shanghai|sh|
Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing.
{quote}
b:category_match
TableScan
alias: category_match
Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: dst_category_en (type: string)
outputColumnNames: _col1
Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
{quote}
> Left join query plan outputs wrong column when using subquery
> --------------------------------------------------------------
>
> Key: HIVE-9613
> URL: https://issues.apache.org/jira/browse/HIVE-9613
> Project: Hive
> Issue Type: Bug
> Components: Parser, Query Planning
> Affects Versions: 0.14.0, 1.0.0
> Environment: apache hadoop 2.5.1
> Reporter: Li Xin
>
> I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own.
> I have three tables,as follows:
> table 1: _hivetemp.category_city_rank_:
> ||category||city||rank||
> |jinrongfuwu|shanghai|1|
> |ktvjiuba|shanghai|2|
> table 2:_hivetemp.category_match_:
> ||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
> |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
> |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
> table 3:_hivetemp.city_match_:
> ||src_city_name_en||dst_city_name_en||city_name_cn||
> |sh|shanghai|上海|
> And the query is :
> {code}
> select
> a.category,
> a.city,
> a.rank,
> b.src_category_en,
> c.src_city_name_en
> from
> hivetemp.category_city_rank a
> left outer join
> (select
> src_category_en,
> dst_category_en
> from
> hivetemp.category_match) b
> on a.category = b.dst_category_en
> left outer join
> (select
> src_city_name_en,
> dst_city_name_en
> from
> hivetemp.city_match) c
> on a.city = c.dst_city_name_en
> {code}
> which shoud output the results as follows,and i test it in hive 0.13:
> ||category||city||rank||src_category_en||src_city_name_en||
> |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
> |ktvjiuba|shanghai|2|zpwentiyingshi|sh|
> but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents:
> ||category||city||rank||src_category_en||src_city_name_en||
> |jinrongfuwu|shanghai|1|shanghai|sh|
> |ktvjiuba|shanghai|2|shanghai|sh|
> Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing.
> {quote}
> b:category_match
> TableScan
> alias: category_match
> Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: dst_category_en (type: string)
> outputColumnNames: _col1
> Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)