You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by murali parimi <mu...@icloud.com> on 2015/03/06 18:32:13 UTC

SMB map join issues - hive version 13 - mapr distribution.

Hello Team,

I am trying to join a big table with around 30 columns with another table with just one column.

Table A (part of the ddl):

CLUSTERED BY (
  party_id)
SORTED BY (
  party_id ASC,
  acct_id ASC)
INTO 64 BUCKETS

Table B(part of the ddl):

CLUSTERED BY (
  party_id)
SORTED BY (
  party_id ASC)
INTO 64 BUCKETS

Table A has : 3 billion records.
Table B has: 353 million records.

Since both of them are clustered by same key, can we achieve a SMB map join, where there is no need of reducers? When I try running this join, I could see there are reducers in my job. So Can somebody help what is the purpose of reducers here?

I have used the following settings to force SMB map-join.

set hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
set hive.auto.convert.sortmerge.join.noconditionaltask=true;
set hive.auto.convert.join.use.nonstaged=true;


Thanks,
Murali