You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "chuanjie.duan (Jira)" <ji...@apache.org> on 2020/07/10 08:02:00 UTC
[jira] [Updated] (HIVE-23833) wrong explain and result when full
join with join
[ https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chuanjie.duan updated HIVE-23833:
---------------------------------
Component/s: Hive
Affects Version/s: 2.1.1
Description:
Reproduce:
# Create three tables, mytest_t1, mytest_t2, mytest_t4
# hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;"
# hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#FF0000}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};"
2 - explain
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: material_code (type: string), wh_guid (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
TableScan
alias: t2
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: material_code (type: string), wh_guid (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: string)
Reduce Operator Tree:
Join Operator
condition map:
{color:#FF0000}Outer Join 0 to 1{color}
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col1, _col3
Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: COALESCE(_col1,_col3) (type: string)
outputColumnNames: _col0
Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
3 - explain
STAGE PLANS:
Stage: Stage-7
Map Reduce Local Work
Alias -> Map Local Tables:
$hdt$_1:t2
Fetch Operator
limit: -1
$hdt$_2:t5
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
$hdt$_1:t2
TableScan
alias: t2
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: material_code is not null (type: boolean)
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: material_code (type: string), wh_guid (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 _col0 (type: string)
1 _col0 (type: string)
$hdt$_2:t5
TableScan
alias: t5
Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: material_code is not null (type: boolean)
Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: material_code (type: string)
outputColumnNames: _col0
Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
HashTable Sink Operator
keys:
0 COALESCE(_col0,_col2) (type: string)
1 _col0 (type: string)
Stage: Stage-5
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: material_code is not null (type: boolean)
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: material_code (type: string), wh_guid (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
{color:red} Inner Join 0 to 1{color}
keys:
0 _col0 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
{color:red} Inner Join 0 to 1{color}
keys:
0 COALESCE(_col0,_col2) (type: string)
1 _col0 (type: string)
outputColumnNames: _col1, _col3
Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: COALESCE(_col1,_col3) (type: string)
outputColumnNames: _col0
Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Summary: wrong explain and result when full join with join (was: hive)
> wrong explain and result when full join with join
> -------------------------------------------------
>
> Key: HIVE-23833
> URL: https://issues.apache.org/jira/browse/HIVE-23833
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.1.1
> Reporter: chuanjie.duan
> Priority: Major
>
> Reproduce:
> # Create three tables, mytest_t1, mytest_t2, mytest_t4
> # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;"
> # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#FF0000}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};"
> 2 - explain
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t1
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: material_code (type: string), wh_guid (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: string)
> TableScan
> alias: t2
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: material_code (type: string), wh_guid (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: string)
> Reduce Operator Tree:
> Join Operator
> condition map:
> {color:#FF0000}Outer Join 0 to 1{color}
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col1, _col3
> Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: COALESCE(_col1,_col3) (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> 3 - explain
> STAGE PLANS:
> Stage: Stage-7
> Map Reduce Local Work
> Alias -> Map Local Tables:
> $hdt$_1:t2
> Fetch Operator
> limit: -1
> $hdt$_2:t5
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> $hdt$_1:t2
> TableScan
> alias: t2
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: material_code is not null (type: boolean)
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: material_code (type: string), wh_guid (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
> HashTable Sink Operator
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> $hdt$_2:t5
> TableScan
> alias: t5
> Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: material_code is not null (type: boolean)
> Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: material_code (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
> HashTable Sink Operator
> keys:
> 0 COALESCE(_col0,_col2) (type: string)
> 1 _col0 (type: string)
> Stage: Stage-5
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t1
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: material_code is not null (type: boolean)
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: material_code (type: string), wh_guid (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> {color:red} Inner Join 0 to 1{color}
> keys:
> 0 _col0 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> {color:red} Inner Join 0 to 1{color}
> keys:
> 0 COALESCE(_col0,_col2) (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col1, _col3
> Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: COALESCE(_col1,_col3) (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
--
This message was sent by Atlassian Jira
(v8.3.4#803005)