You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2017/11/28 11:05:00 UTC

[jira] [Updated] (SPARK-22626) Wrong Hive table statistics may trigger OOM if enables join reorder in CBO

     [ https://issues.apache.org/jira/browse/SPARK-22626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuming Wang updated SPARK-22626:
--------------------------------
    Description: 
How to reproduce:

{code}
bin/spark-shell --conf spark.sql.cbo.enabled=true --conf spark.sql.cbo.joinReorder.enabled=true
{code}

{code:java}
import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec

spark.sql("CREATE TABLE small (c1 bigint) TBLPROPERTIES ('numRows'='3', 'rawDataSize'='600','totalSize'='800')")
// Big table with wrong statistics, numRows=0
spark.sql("CREATE TABLE big (c1 bigint) TBLPROPERTIES ('numRows'='0', 'rawDataSize'='60000000000', 'totalSize'='8000000000000')")

val plan = spark.sql("select * from small t1 join big t2 on (t1.c1 = t2.c1)").queryExecution.executedPlan
val buildSide = plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide

println(buildSide)
{code}

The result is {{BuildRight}}, but the right side is the big table.

> Wrong Hive table statistics may trigger OOM if enables join reorder in CBO
> --------------------------------------------------------------------------
>
>                 Key: SPARK-22626
>                 URL: https://issues.apache.org/jira/browse/SPARK-22626
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Yuming Wang
>            Priority: Minor
>
> How to reproduce:
> {code}
> bin/spark-shell --conf spark.sql.cbo.enabled=true --conf spark.sql.cbo.joinReorder.enabled=true
> {code}
> {code:java}
> import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec
> spark.sql("CREATE TABLE small (c1 bigint) TBLPROPERTIES ('numRows'='3', 'rawDataSize'='600','totalSize'='800')")
> // Big table with wrong statistics, numRows=0
> spark.sql("CREATE TABLE big (c1 bigint) TBLPROPERTIES ('numRows'='0', 'rawDataSize'='60000000000', 'totalSize'='8000000000000')")
> val plan = spark.sql("select * from small t1 join big t2 on (t1.c1 = t2.c1)").queryExecution.executedPlan
> val buildSide = plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
> println(buildSide)
> {code}
> The result is {{BuildRight}}, but the right side is the big table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org