You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "liupengcheng (JIRA)" <ji...@apache.org> on 2018/01/17 06:56:00 UTC

[jira] [Commented] (SPARK-23124) Warn users when broacast big table in JoinSelection instead of just run it

    [ https://issues.apache.org/jira/browse/SPARK-23124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328358#comment-16328358 ] 

liupengcheng commented on SPARK-23124:
--------------------------------------

I think we should give some warning or exception to the users if no broastHint exists and the sizeInBytes of any child LogicalPlan of join is larger then autoBroadcastThreshold.

so the users can know it's a data amount problem

> Warn users when broacast big table in JoinSelection instead of just run it
> --------------------------------------------------------------------------
>
>                 Key: SPARK-23124
>                 URL: https://issues.apache.org/jira/browse/SPARK-23124
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0, 2.3.0
>            Reporter: liupengcheng
>            Priority: Major
>
> When running a SparkSQL thritserver, we encountered sudden corruption of the thritserver which is caused by OutOfMemoryError.
> After review the code and some debug, I finally find out that the framework permit broadcast big table and give no warnings, detail code see below:
> {code:java}
> case logical.Join(left, right, joinType, condition) =>
>   val buildSide = broadcastSide(canBuildLeft = true, canBuildRight = true, left, right)
>   // This join could be very slow or OOM
>   joins.BroadcastNestedLoopJoinExec(
>     planLater(left), planLater(right), buildSide, joinType, condition) :: Nil
> private def broadcastSide(
>     canBuildLeft: Boolean,
>     canBuildRight: Boolean,
>     left: LogicalPlan,
>     right: LogicalPlan): BuildSide = {
>   def smallerSide =
>     if (right.stats.sizeInBytes <= left.stats.sizeInBytes) BuildRight else BuildLeft
>   val buildRight = canBuildRight && right.stats.hints.broadcast
>   val buildLeft = canBuildLeft && left.stats.hints.broadcast
>   if (buildRight && buildLeft) {
>     // Broadcast smaller side base on its estimated physical size
>     // if both sides have broadcast hint
>     smallerSide
>   } else if (buildRight) {
>     BuildRight
>   } else if (buildLeft) {
>     BuildLeft
>   } else if (canBuildRight && canBuildLeft) {
>     // for the last default broadcast nested loop join
>     smallerSide
>   } else {
>     throw new AnalysisException("Can not decide which side to broadcast for this join")
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org