You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tanel Kiis (Jira)" <ji...@apache.org> on 2020/12/29 18:20:00 UTC

[jira] [Updated] (SPARK-33935) Fix CBOs cost function

     [ https://issues.apache.org/jira/browse/SPARK-33935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tanel Kiis updated SPARK-33935:
-------------------------------
    Issue Type: Bug  (was: Improvement)

> Fix CBOs cost function 
> -----------------------
>
>                 Key: SPARK-33935
>                 URL: https://issues.apache.org/jira/browse/SPARK-33935
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Tanel Kiis
>            Priority: Major
>
> The parameter spark.sql.cbo.joinReorder.card.weight is decumented as:
> {code:title=spark.sql.cbo.joinReorder.card.weight}
> The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight).
> {code}
> But in the implementation the formula is a bit different:
> {code:title=Current implementation}
>     def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>       if (other.planCost.card == 0 || other.planCost.size == 0) {
>         false
>       } else {
>         val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card)
>         val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size)
>         relativeRows * conf.joinReorderCardWeight +
>           relativeSize * (1 - conf.joinReorderCardWeight) < 1
>       }
>     }
> {code}
> This change has an unfortunate consequence: 
> given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes.
> A example values, that have this fenomen with the default weight value (0.7):
> A.card = 500, B.card = 300
> A.size = 30, B.size = 80
> Both A betterThan B and B betterThan A would have score above 1 and would return false.
> A new implementation is proposed, that matches the documentation:
> {code:title=Proposed implementation}
>     def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
>       val oldCost = BigDecimal(this.planCost.card) * conf.joinReorderCardWeight +
>         BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight)
>       val newCost = BigDecimal(other.planCost.card) * conf.joinReorderCardWeight +
>         BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight)
>       newCost < oldCost
>     }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org