You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Szehon Ho (JIRA)" <ji...@apache.org> on 2014/11/04 21:01:34 UTC
[jira] [Assigned] (HIVE-8701) Combine nested map joins into the
parent map join if possible [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szehon Ho reassigned HIVE-8701:
-------------------------------
Assignee: Szehon Ho
> Combine nested map joins into the parent map join if possible [Spark Branch]
> ----------------------------------------------------------------------------
>
> Key: HIVE-8701
> URL: https://issues.apache.org/jira/browse/HIVE-8701
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Szehon Ho
>
> With the work in HIVE-8616 enabled, the generated plan shows that the nested map join operator isn't merged to its parent when possible. This is demonstrated in auto_join2.q. The MR plan shown that this optimization is in place. We should do the same for Spark.
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Spark
> Edges:
> Map 2 <- Map 3 (NONE, 0)
> Map 3 <- Map 1 (NONE, 0)
> DagName: xzhang_20141102074141_ac089634-bf01-4386-b1cf-3e7f2e99f6eb:3
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: src2
> Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: key (type: string)
> sort order: +
> Map-reduce partition columns: key (type: string)
> Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE
> Map 2
> Map Operator Tree:
> TableScan
> alias: src3
> Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: UDFToDouble(key) is not null (type: boolean)
> Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col0}
> 1 {value}
> keys:
> 0 (_col0 + _col5) (type: double)
> 1 UDFToDouble(key) (type: double)
> outputColumnNames: _col0, _col11
> input vertices:
> 0 Map 3
> Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), _col11 (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 17 Data size: 1813 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Map 3
> Map Operator Tree:
> TableScan
> alias: src1
> Statistics: Num rows: 58 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: key is not null (type: boolean)
> Statistics: Num rows: 29 Data size: 2906 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {key}
> 1 {key}
> keys:
> 0 key (type: string)
> 1 key (type: string)
> outputColumnNames: _col0, _col5
> input vertices:
> 1 Map 1
> Statistics: Num rows: 31 Data size: 3196 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (_col0 + _col5) is not null (type: boolean)
> Statistics: Num rows: 16 Data size: 1649 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: (_col0 + _col5) (type: double)
> sort order: +
> Map-reduce partition columns: (_col0 + _col5) (type: double)
> Statistics: Num rows: 16 Data size: 1649 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)