You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Lefty Leverenz (JIRA)" <ji...@apache.org> on 2014/08/18 08:07:19 UTC

[jira] [Comment Edited] (HIVE-6144) Implement non-staged MapJoin

    [ https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100309#comment-14100309 ] 

Lefty Leverenz edited comment on HIVE-6144 at 8/18/14 6:06 AM:
---------------------------------------------------------------

Review request:  *hive.auto.convert.join.use.nonstaged* has been added to the section "Optimize Auto Join Conversion" in a version-0.13.0 box.  Is that the right place for it?  Could we have some examples and guidance on when to use it?

* [Optimize Auto Join Conversion | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]

Also in that section, I changed the value of *hive.auto.convert.join.noconditionaltask.size* to match the default (10000000) -- it had been 10000 which seemed rather small, but if that value was intended please let me know.

<Edit>  Should this information from the parameter description be included in the version-0.13.0 box in "Optimize Auto Join Conversion"? -- "Currently, this is not working with vectorization or Tez execution engine." 


was (Author: lefty@hortonworks.com):
Review request:  *hive.auto.convert.join.use.nonstaged* has been added to the section "Optimize Auto Join Conversion" in a version-0.13.0 box.  Is that the right place for it?  Could we have some examples and guidance on when to use it?

* [Optimize Auto Join Conversion | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-OptimizeAutoJoinConversion]

Also in that section, I changed the value of *hive.auto.convert.join.noconditionaltask.size* to match the default (10000000) -- it had been 10000 which seemed rather small, but if that value was intended please let me know.

> Implement non-staged MapJoin
> ----------------------------
>
>                 Key: HIVE-6144
>                 URL: https://issues.apache.org/jira/browse/HIVE-6144
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>              Labels: TODOC13
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt, HIVE-6144.6.patch.txt, HIVE-6144.7.patch.txt, HIVE-6144.8.patch.txt, HIVE-6144.9.patch.txt
>
>
> For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example.
> {noformat}
> select a.* from src a join src b on a.key=b.key;
> {noformat}
> makes plan like this.
> {noformat}
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         a 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         a 
>           TableScan
>             alias: a
>             HashTable Sink Operator
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               Position of Big Table: 1
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                 File Output Operator
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
> {noformat}
> table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below.
> {noformat}
>   Stage: Stage-3
>     Map Reduce
>       Alias -> Map Operator Tree:
>         b 
>           TableScan
>             alias: b
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {key} {value}
>                 1 
>               handleSkewJoin: false
>               keys:
>                 0 [Column[key]]
>                 1 [Column[key]]
>               outputColumnNames: _col0, _col1
>               Position of Big Table: 1
>               Select Operator
>                   File Output Operator
>       Local Work:
>         Map Reduce Local Work
>           Alias -> Map Local Tables:
>             a 
>               Fetch Operator
>                 limit: -1
>           Alias -> Map Local Operator Tree:
>             a 
>               TableScan
>                 alias: a
>           Has Any Stage Alias: false
>   Stage: Stage-0
>     Fetch Operator
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)