You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Liyin Tang <li...@gmail.com> on 2010/12/01 04:44:01 UTC
Join Optimization in Hive
Hi All
We have improved the join performance in Hive and Hive can automatically
convert join into map join based on input data size.
So users don’t need to give the hint for the query.
Here is the performance comparison between previous map join with new
optimized map join
Small Table
Big Table
Join Condition
Average Previous Map Join Execution time
Average New Optimized Map Join Execution time
Performance Improvement
75 K rows;
383K file size
130 M rows;
3.5G file size;
1 join key,
2 join value
1032 sec
79 sec
+ 1206%
500 K rows;
2.6M file size
130 M rows;
3.5G file size
1 join key,
2 join value
3991 sec
144 sec
+2671 %
75 K rows;
383K file size
16.7 B rows;
459 G file size
1 join key,
2 join value
4801 sec
325 sec
+ 1377 %
>From the result, the new optimized map join will be 12 ~26 faster than the
previous one.
Second, right now, Hive can convert join into Map Join based on input data
size automatically and dynamically.
Here is the performance comparison between the previous common join with new
common join.
All the benchmark queries here can be converted into map join.
Small Table
Big Table
Join Condition
Average Previous Common Join Execution time
Average New Optimized Common Join Execution time
Performance Improvement
75 K rows;
383K file size
130 M rows;
3.5G file size;
1 join key,
2 join value
169 sec
79 sec
+ 114%
500 K rows;
2.6M file size
130 M rows;
3.5G file size
1 join key,
2 join value
246 sec
144 sec
+71 %
75 K rows;
383K file size
16.7 B rows;
459 G file size
1 join key,
2 join value
511 sec
325 sec
+ 57 %
500 K rows;
2.6M file size
16.7 B rows;
459 G file size
1 join key,
2 join value
502 sec
305 sec
+64 %
1M rows;
10M file size
16.7 B rows;
459 G file size
1 join key,
3 join value
653 sec
248 sec
+163 %
1M rows;
10M file size
16.7 B rows;
459 G file size
2 join key,
2 join value
1117sec
536 sec
+108%
>From the result, if the new common join can be converted into map join, it
will get 57% ~163 % performance improvements.
If you are interested in more details, please check this wiki page:
http://wiki.apache.org/hadoop/Hive/JoinOptimization .
Best Regards
Liyin