You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Pau Tallada <ta...@pic.es> on 2020/10/23 13:39:17 UTC
SMB joins not working on partitioned queries

Hi all,

I'm trying to understand (for the last months) why SMB joins (the most
efficient ones) are unable to work with partitions.
All my being tells me it should work but, it doesn't. I would really,
really, (really!) appreciate any insights on this problem.

First, the working example without partitions. I create two tables ("a",
and "b"), same clustering, sorting and number of buckets.

CREATE TABLE a (key STRING, value1 INT)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS
STORED AS ORC;

INSERT INTO a VALUES ("a", 1), ("b", 2), ("c", 3), ("d", 4);

CREATE TABLE b (key STRING, value2 FLOAT)
CLUSTERED BY (key) SORTED BY (key ASC) INTO 4 BUCKETS
STORED AS ORC;

INSERT INTO b VALUES ("a", 1.1), ("b", 2.2), ("c", 3.3), ("d", 4.4);

Then I join both of them with SMB enabled and it does a Merge Join in a
single Map stage, as expected:

SET hive.auto.convert.join=true;
SET hive.auto.convert.join.noconditionaltask=true;
SET hive.auto.convert.join.noconditionaltask.size = 1;
SET hive.optimize.bucketmapjoin=true;
SET hive.optimize.bucketmapjoin.sortedmerge=true;
EXPLAIN
SELECT a.key, a.value1, b.value2
FROM a JOIN b ON a.key=b.key;

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Map 2
      File Output Operator [FS_10]
        Select Operator [SEL_9] (rows=16 width=93)
          Output:["_col0","_col1","_col2"]

*     ===> Merge Join Operator [MERGEJOIN_25] (rows=16 width=93) <===*
      Conds:SEL_2._col0=SEL_5._col0(Inner),Output:["_col0","_col1","_col3"]
          <-Select Operator [SEL_2] (rows=8 width=89)
              Output:["_col0","_col1"]
              Filter Operator [FIL_13] (rows=8 width=89)
                predicate:key is not null
                TableScan [TS_0] (rows=8 width=89)
                  tallada@a
,a,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value1"]
          <-Select Operator [SEL_5] (rows=8 width=89)
              Output:["_col0","_col1"]
              Filter Operator [FIL_14] (rows=8 width=89)
                predicate:key is not null
                TableScan [TS_3] (rows=8 width=89)
                  tallada@b
,b,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value2"]

Then I try to do the same but with a partitioned table:

CREATE TABLE pa (key STRING, value1 INT)
PARTITIONED BY (part STRING) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4
BUCKETS
STORED AS ORC;

INSERT INTO pa VALUES ("p1", "a", 1), ("p1", "b", 2), ("p1", "c", 3),
("p1", "d", 4);

CREATE TABLE pb (key STRING,value2 FLOAT)
PARTITIONED BY (part STRING) CLUSTERED BY (key) SORTED BY (key ASC) INTO 4
BUCKETS
STORED AS ORC;

INSERT INTO pb VALUES ("p1", "a", 1.1), ("p1", "b", 2.2), ("p1", "c", 3.3),
("p1", "d", 4.4);

Then I try to perform the same join but restricted on a single partition.
But Hive adds an unneeded shuffle I do not understand.
Physically, the join is exactly the same, it should just join the files on
that partition.
Note that the real table I'm dealing with has 315 partitions with a total
size of 4.5 T. So shuffling around that much data really hurts the query
response time.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 2
      File Output Operator [FS_10]
        Select Operator [SEL_9] (rows=4 width=272)
          Output:["_col0","_col1","_col2","_col3"]
          Merge Join Operator [MERGEJOIN_31] (rows=4 width=272)
            Conds:RS_34._col0, _col2=RS_40._col0,
_col2(Inner),Output:["_col0","_col1","_col2","_col4"]
          <-Map 1 [SIMPLE_EDGE] vectorized
            SHUFFLE [RS_34]
              PartitionCols:_col0, _col2
              Select Operator [SEL_33] (rows=4 width=271)
                Output:["_col0","_col1","_col2"]
                Filter Operator [FIL_32] (rows=4 width=271)
                  predicate:key is not null
                  TableScan [TS_0] (rows=4 width=271)
                    tallada@pa
,pa,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value1"]
            Dynamic Partitioning Event Operator [EVENT_37] (rows=2
width=272)
              Group By Operator [GBY_36] (rows=2 width=272)
                Output:["_col0"],keys:_col0
                Select Operator [SEL_35] (rows=4 width=271)
                  Output:["_col0"]
                   Please refer to the previous Select Operator [SEL_33]
          <-Map 3 [SIMPLE_EDGE] vectorized
            SHUFFLE [RS_40]
              PartitionCols:_col0, _col2
              Select Operator [SEL_39] (rows=4 width=271)
                Output:["_col0","_col1","_col2"]
                Filter Operator [FIL_38] (rows=4 width=271)
                  predicate:key is not null
                  TableScan [TS_3] (rows=4 width=271)
                    tallada@pb
,pb,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value2"]

Is there any mistake on my part? Is it maybe an overlook/bug from Hive?

Shall I open a bug report or a feature request? 🤔

Much appreciated,

Pau.