You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "marymwu (JIRA)" <ji...@apache.org> on 2016/06/21 08:05:58 UTC

[jira] [Created] (SPARK-16093) Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

marymwu created SPARK-16093:
-------------------------------

             Summary: Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1
                 Key: SPARK-16093
                 URL: https://issues.apache.org/jira/browse/SPARK-16093
             Project: Spark
          Issue Type: Bug
    Affects Versions: 2.0.0
            Reporter: marymwu


Spark2.0 take no effect after set spark.sql.autoBroadcastJoinThreshold = 1

Precondition:
set spark.sql.autoBroadcastJoinThreshold = 1;

Testcase:
"INSERT OVERWRITE TABLE RPS__H_REPORT_MORE_DIMENSION_FIRST_CHANNEL_VISIT_CD_DAY PARTITION (p_event_date='2016-06-18')
select a.app_key,a.app_channel,b.device_model,sum(b.visits) visitsNum from
(select app_key,app_channel,lps_did from RPS__H_REPORT_MORE_DIMENSION_EARLIEST_NEWUSER_LIST_C ) a 
join 
(select app_key,lps_did,device_model, count(1) as visits from RPS__H_REPORT_MORE_DIMENSION_SMALL where  p_event_date = '2016-06-18'
and ( log_type=1 or  log_type=2)  
group by  app_key,lps_did,device_model) b  
on a.lps_did = b.lps_did and a.app_key=b.app_key 
group by a.app_key,a.app_channel,b.device_model;
"
== Physical Plan ==
InsertIntoHiveTable MetastoreRelation default, rps__h_report_more_dimension_first_channel_visit_cd_day, None, Map(p_event_date -> Some(2016-06-18)), true, false
+- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], functions=[(sum(visits#3L),mode=Final,isDistinct=false)], output=[app_key#7,app_channel#9,device_model#20,visitsNum#4L])
   +- Exchange(coordinator id: 41547585) hashpartitioning(app_key#7, app_channel#9, device_model#20, 600), Some(coordinator[target post-shuffle partition size: 500000000])
      +- TungstenAggregate(key=[app_key#7,app_channel#9,device_model#20], functions=[(sum(visits#3L),mode=Partial,isDistinct=false)], output=[app_key#7,app_channel#9,device_model#20,sum#41L])
         +- Project [app_key#7,app_channel#9,device_model#20,visits#3L]
            +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], Inner, BuildRight, None
               :- Filter (isnotnull(app_key#7) && isnotnull(lps_did#8))
               :  +- HiveTableScan [app_key#7,app_channel#9,lps_did#8], MetastoreRelation default, rps__h_report_more_dimension_earliest_newuser_list_c, None
               +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, string], input[0, string]))
                  +- TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], functions=[(count(1),mode=Final,isDistinct=false)], output=[app_key#12,lps_did#13,device_model#20,visits#3L])
                     +- Exchange(coordinator id: 733045095) hashpartitioning(app_key#12, lps_did#13, device_model#20, 600), Some(coordinator[target post-shuffle partition size: 500000000])
                        +- TungstenAggregate(key=[app_key#12,lps_did#13,device_model#20], functions=[(count(1),mode=Partial,isDistinct=false)], output=[app_key#12,lps_did#13,device_model#20,count#39L])
                           +- Project [app_key#12,lps_did#13,device_model#20]
                              +- Filter ((isnotnull(app_key#12) && isnotnull(lps_did#13)) && ((cast(log_type#11 as int) = 1) || (cast(log_type#11 as int) = 2)))
                                 +- HiveTableScan [app_key#12,lps_did#13,device_model#20,log_type#11], MetastoreRelation default, rps__h_report_more_dimension_small, None, [isnotnull(p_event_date#10),(p_event_date#10 = 2016-06-18)]
Time taken: 4.775 seconds, Fetched 1 row(s)
16/06/20 16:55:16 INFO CliDriver: Time taken: 4.775 seconds, Fetched 1 row(s)
Note: +- BroadcastHashJoin [lps_did#8,app_key#7], [lps_did#13,app_key#12], Inner, BuildRight, None

Result:
1. Execution failed, spark service is unavailable.
2. Even though set spark.sql.autoBroadcastJoinThreshold = 1, BroadcastHashJoin has been used when join two large tables.

Error log is as attached.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org