You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Liyin Tang <li...@gmail.com> on 2010/12/01 04:44:01 UTC

Join Optimization in Hive

Hi All

We have improved the join performance in Hive and  Hive can automatically
convert join into map join based on input data size.

So users don’t need to give the hint for the query.



Here is the performance comparison between previous map join with new
optimized map join

Small Table

Big Table

Join Condition

Average Previous Map Join Execution time

Average New Optimized Map Join Execution time

Performance Improvement

75 K rows;

383K file size

130 M rows;

3.5G file size;

1 join key,

2 join value

1032 sec

79 sec

+ 1206%

500 K rows;

2.6M file size

130 M rows;

3.5G file size

1 join key,

2 join value

3991 sec

144 sec

+2671 %

75 K rows;

383K file size

16.7 B rows;

459 G file size

1 join key,

2 join value

4801 sec

325 sec

+ 1377 %



>From the result, the new optimized map join will be 12 ~26 faster than the
previous one.



Second, right now, Hive can convert join into Map Join based on input data
size automatically and dynamically.

Here is the performance comparison between the previous common join with new
common join.

All the benchmark queries here can be converted into map join.



Small Table

Big Table

Join Condition

Average Previous Common Join Execution time

Average New Optimized Common Join Execution time

Performance Improvement

75 K rows;

383K file size

130 M rows;

3.5G file size;

1 join key,

2 join value

169 sec

79 sec

+ 114%

500 K rows;

2.6M file size

130 M rows;

3.5G file size

1 join key,

2 join value

246 sec

144 sec

+71 %

75 K rows;

383K file size

16.7 B rows;

459 G file size

1 join key,

2 join value

511 sec

325 sec

+ 57 %

500 K rows;

2.6M file size

16.7 B rows;

459 G file size

1 join key,

2 join value

502 sec

305 sec

+64 %

1M rows;

10M file size

16.7 B rows;

459 G file size

1 join key,

3 join value

653 sec

248 sec

+163 %

1M rows;

10M file size

16.7 B rows;

459 G file size

2 join key,

2 join value

1117sec

536 sec

+108%



>From the result, if the new common join can be converted into map join, it
will get 57% ~163 % performance improvements.



If you are interested in more details, please check this wiki page:
http://wiki.apache.org/hadoop/Hive/JoinOptimization .



Best Regards

Liyin