You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Maria <li...@126.com> on 2017/01/04 03:03:00 UTC

hive on spark ，three tables(one is small, others are big)，cannot go mapjoin

Hi,all
   I have a  doubt：
my test hql is :
"select tmp.src_ip,c.to_ip from (select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip) tmp join email c on tmp.appid=c.appid" , im and email are bigtables, 
set hive.execution.engine=mr, the execution plan generated two mapjoin stage, 
set  hive.execution.engine=spark,the execution plan generated one map join and one common join ,this is to say
"(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip)" go mapjoin ,and its result "tmp" has only 10 item, BUT "tmp join email" cannot go mapjoin...... 
and I DEBUG the code,:


in hive-on-spark:
(1)(select a.src_ip,b.appid from small_tbl a join im b on a.src_ip=b.src_ip) ----------------->>>  MapWork.getMapredLocalWork() is OK,there is one MapRedLocalWork Object
(2) the result of the previous stage named ‘tmp’ join email, MapWork.getMapredLocalWork() is null.


Why hive on spark can not go mapjoin in this case, thankyou...