You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Amir Youssefi (JIRA)" <ji...@apache.org> on 2013/02/12 19:25:12 UTC

[jira] [Updated] (HIVE-4011) Sort Merge Join does not kick-in and runs locally

     [ https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amir Youssefi updated HIVE-4011:
--------------------------------

    Summary: Sort Merge Join does not kick-in and runs locally  (was: Sort Merge Join does not kick-in)
    
> Sort Merge Join does not kick-in and runs locally
> -------------------------------------------------
>
>                 Key: HIVE-4011
>                 URL: https://issues.apache.org/jira/browse/HIVE-4011
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0, 0.10.0
>         Environment: Linux
>            Reporter: Amir Youssefi
>              Labels: joins, mapjoin
>
> After required settings to get Sort Merge Join, it does not kick-in and falls back to MapJoin with a local first step (on two bucketed and partitioned tables).
> Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran it on Hive 0.10 with sample public data and regular storage Formats.
> More details:
> set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
> set hive.optimize.bucketmapjoin = true;
> set hive.optimize.bucketmapjoin.sortedmerge = true;
> select /*+ MAPJOIN(l) */
> l.stock_price_open lo,
> r.stock_price_open ro
> from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and l.stock_symbol = r.stock_symbol and l.dte=r.dte)
> where ...
> DDL:
> (both tables)
> PARTITIONED BY (year string)
> CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
> STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
> also made sure we had:
> set hive.enforce.bucketing=true;
> set hive.enforce.sorting=true;
> Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira