You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org> on 2010/12/01 11:33:11 UTC
[jira] Commented: (HIVE-1695) MapJoin followed by ReduceSink should
be done as single MapReduce Job
[ https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965652#action_12965652 ]
Sreekanth Ramakrishnan commented on HIVE-1695:
----------------------------------------------
Attaching a sample plan based on the above approach
{noformat}
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF test a) (TOK_TABREF test1 b) (= (. (TOK_TABLE_OR_COL a) key) (. (TOK_TABLE_OR_COL b) key)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_HINTLIST (TOK_HINT TOK_MAPJOIN (TOK_HINTARGLIST b))) (TOK_SELEXPR (. (TOK_TABLE_OR_COL a) key))) (TOK_ORDERBY (TOK_TABSORTCOLNAMEASC (. (TOK_TABLE_OR_COL a) key)))))
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-1 depends on stages: Stage-4
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
b
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
b
TableScan
alias: b
HashTable Sink Operator
condition expressions:
0 {key}
1
handleSkewJoin: false
keys:
0 [Column[key]]
1 [Column[key]]
Position of Big Table: 0
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
a
TableScan
alias: a
Map Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {key}
1
handleSkewJoin: false
keys:
0 [Column[key]]
1 [Column[key]]
outputColumnNames: _col0
Position of Big Table: 0
Reduce Output Operator
key expressions:
expr: _col0
type: int
sort order: +
tag: -1
value expressions:
expr: _col0
type: int
Local Work:
Map Reduce Local Work
Reduce Operator Tree:
Extract
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: -1
{noformat}
Currently, trying to fix the issue in the map failure due to the above plan. Stil figuring out how to add a Select Operator at the end of the map join operator.
> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map only job followed by a Map-Reduce job. It can be combined into single MapReduce Job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.