You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "chuanjie.duan (Jira)" <ji...@apache.org> on 2020/07/10 08:02:00 UTC

[jira] [Updated] (HIVE-23833) wrong explain and result when full join with join

     [ https://issues.apache.org/jira/browse/HIVE-23833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chuanjie.duan updated HIVE-23833:
---------------------------------
          Component/s: Hive
    Affects Version/s: 2.1.1
          Description: 
Reproduce:
 # Create three tables, mytest_t1, mytest_t2, mytest_t4
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;"
 # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#FF0000}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};"

2 - explain

 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 value expressions: _col1 (type: string)
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 Reduce Output Operator
 key expressions: _col0 (type: string)
 sort order: +
 Map-reduce partition columns: _col0 (type: string)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 value expressions: _col1 (type: string)
 Reduce Operator Tree:
 Join Operator
 condition map:
 {color:#FF0000}Outer Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col1, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: COALESCE(_col1,_col3) (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
 File Output Operator
 compressed: false
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 

3 - explain

STAGE PLANS:
 Stage: Stage-7
 Map Reduce Local Work
 Alias -> Map Local Tables:
 $hdt$_1:t2 
 Fetch Operator
 limit: -1
 $hdt$_2:t5 
 Fetch Operator
 limit: -1
 Alias -> Map Local Operator Tree:
 $hdt$_1:t2 
 TableScan
 alias: t2
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
 HashTable Sink Operator
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 $hdt$_2:t5 
 TableScan
 alias: t5
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: material_code (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
 HashTable Sink Operator
 keys:
 0 COALESCE(_col0,_col2) (type: string)
 1 _col0 (type: string)

Stage: Stage-5
 Map Reduce
 Map Operator Tree:
 TableScan
 alias: t1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 Filter Operator
 predicate: material_code is not null (type: boolean)
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: material_code (type: string), wh_guid (type: string)
 outputColumnNames: _col0, _col1
 Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
 Map Join Operator
 condition map:
{color:red} Inner Join 0 to 1{color}
 keys:
 0 _col0 (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col0, _col1, _col2, _col3
 Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
 Map Join Operator
 condition map:
{color:red} Inner Join 0 to 1{color}
 keys:
 0 COALESCE(_col0,_col2) (type: string)
 1 _col0 (type: string)
 outputColumnNames: _col1, _col3
 Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
 Select Operator
 expressions: COALESCE(_col1,_col3) (type: string)
 outputColumnNames: _col0
 Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
 File Output Operator
 compressed: false
 Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
 table:
 input format: org.apache.hadoop.mapred.SequenceFileInputFormat
 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Summary: wrong explain and result when full join with join  (was: hive)

> wrong explain and result when full join with join
> -------------------------------------------------
>
>                 Key: HIVE-23833
>                 URL: https://issues.apache.org/jira/browse/HIVE-23833
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.1
>            Reporter: chuanjie.duan
>            Priority: Major
>
> Reproduce:
>  # Create three tables, mytest_t1, mytest_t2, mytest_t4
>  # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code;"
>  # hive -e "explain select coalesce(t1.wh_guid,t2.wh_guid) as wh_guid from dw_dev.mytest_t1 t1 full join dw_dev.mytest_t2 t2 on t1.material_code = t2.material_code {color:#FF0000}join dw_dev.mytest_t5 t5 on t5.material_code = coalesce(t1.material_code,t2.material_code){color};"
> 2 - explain
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: t1
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: material_code (type: string), wh_guid (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  Reduce Output Operator
>  key expressions: _col0 (type: string)
>  sort order: +
>  Map-reduce partition columns: _col0 (type: string)
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  value expressions: _col1 (type: string)
>  TableScan
>  alias: t2
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: material_code (type: string), wh_guid (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  Reduce Output Operator
>  key expressions: _col0 (type: string)
>  sort order: +
>  Map-reduce partition columns: _col0 (type: string)
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  value expressions: _col1 (type: string)
>  Reduce Operator Tree:
>  Join Operator
>  condition map:
>  {color:#FF0000}Outer Join 0 to 1{color}
>  keys:
>  0 _col0 (type: string)
>  1 _col0 (type: string)
>  outputColumnNames: _col1, _col3
>  Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: COALESCE(_col1,_col3) (type: string)
>  outputColumnNames: _col0
>  Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
>  File Output Operator
>  compressed: false
>  Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe 
> 3 - explain
> STAGE PLANS:
>  Stage: Stage-7
>  Map Reduce Local Work
>  Alias -> Map Local Tables:
>  $hdt$_1:t2 
>  Fetch Operator
>  limit: -1
>  $hdt$_2:t5 
>  Fetch Operator
>  limit: -1
>  Alias -> Map Local Operator Tree:
>  $hdt$_1:t2 
>  TableScan
>  alias: t2
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  Filter Operator
>  predicate: material_code is not null (type: boolean)
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: material_code (type: string), wh_guid (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 1201 Data size: 259416 Basic stats: COMPLETE Column stats: NONE
>  HashTable Sink Operator
>  keys:
>  0 _col0 (type: string)
>  1 _col0 (type: string)
>  $hdt$_2:t5 
>  TableScan
>  alias: t5
>  Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
>  Filter Operator
>  predicate: material_code is not null (type: boolean)
>  Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: material_code (type: string)
>  outputColumnNames: _col0
>  Statistics: Num rows: 12927 Data size: 2430276 Basic stats: COMPLETE Column stats: NONE
>  HashTable Sink Operator
>  keys:
>  0 COALESCE(_col0,_col2) (type: string)
>  1 _col0 (type: string)
> Stage: Stage-5
>  Map Reduce
>  Map Operator Tree:
>  TableScan
>  alias: t1
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  Filter Operator
>  predicate: material_code is not null (type: boolean)
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: material_code (type: string), wh_guid (type: string)
>  outputColumnNames: _col0, _col1
>  Statistics: Num rows: 6159 Data size: 1724520 Basic stats: COMPLETE Column stats: NONE
>  Map Join Operator
>  condition map:
> {color:red} Inner Join 0 to 1{color}
>  keys:
>  0 _col0 (type: string)
>  1 _col0 (type: string)
>  outputColumnNames: _col0, _col1, _col2, _col3
>  Statistics: Num rows: 6774 Data size: 1896972 Basic stats: COMPLETE Column stats: NONE
>  Map Join Operator
>  condition map:
> {color:red} Inner Join 0 to 1{color}
>  keys:
>  0 COALESCE(_col0,_col2) (type: string)
>  1 _col0 (type: string)
>  outputColumnNames: _col1, _col3
>  Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
>  Select Operator
>  expressions: COALESCE(_col1,_col3) (type: string)
>  outputColumnNames: _col0
>  Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
>  File Output Operator
>  compressed: false
>  Statistics: Num rows: 14219 Data size: 2673303 Basic stats: COMPLETE Column stats: NONE
>  table:
>  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe



--
This message was sent by Atlassian Jira
(v8.3.4#803005)