You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/03/26 13:59:41 UTC

[jira] [Resolved] (SPARK-13169) CROSS JOIN slow or fails on tiny table

     [ https://issues.apache.org/jira/browse/SPARK-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-13169.
----------------------------------
    Resolution: Cannot Reproduce

I am resolving this. I can't reproduce this against the current master as below:

{code}
val sql = """
SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
           SELECT DISTINCT * FROM (
             SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`) 
              AS `zzz1`)
         AS `gwtlyrpywf`
       CROSS JOIN (
           SELECT DISTINCT * FROM (
             SELECT `vs` AS `vs` FROM `mtcars`)
           AS `zzz3`)
       AS `arytvfispy`
"""

spark.read.option("header", true).option("inferSchema", true).csv("mtcars.csv").createOrReplaceTempView("mtcars")
spark.sql(sql).show()
{code}

{code}
+----+---+---+
|gear|cyl| vs|
+----+---+---+
|   5|  6|  1|
|   5|  6|  0|
|   5|  4|  1|
|   5|  4|  0|
|   4|  6|  1|
|   4|  6|  0|
|   3|  6|  1|
|   3|  6|  0|
|   5|  8|  1|
|   5|  8|  0|
|   3|  8|  1|
|   3|  8|  0|
|   3|  4|  1|
|   3|  4|  0|
|   4|  4|  1|
|   4|  4|  0|
+----+---+---+
{code}

This seems fixed in the master. It would be great if someone identifies the JIRA and backports this if applicable.

> CROSS JOIN slow or fails on tiny table
> --------------------------------------
>
>                 Key: SPARK-13169
>                 URL: https://issues.apache.org/jira/browse/SPARK-13169
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Antonio Piccolboni
>
> I am running a cross join with a distinct select on both sides. Table is tiny (32 X 16). Running query through the thriftserver. Data is here (https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv). Query never terminates before 200s, mostly fails (TTransportException) while all cores are being used (single machine).
> Query is
> {code}
> SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
>            SELECT DISTINCT * FROM (
>              SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`) 
>               AS `zzz1`)
>          AS `gwtlyrpywf`
>        CROSS JOIN (
>            SELECT DISTINCT * FROM (
>              SELECT `vs` AS `vs` FROM `mtcars`)
>            AS `zzz3`)
>        AS `arytvfispy`
> {code}
> I know it can be simplified, but it comes from a generator and the generator counts on the optimizer to do the right thing. EXPLAIN shows the following
> {code}
> plan
> 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    == Physical Plan ==
> 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Project [gear#21,cyl#22,vs#23]
> 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    +- CartesianProduct
> 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       :- ConvertToSafe
> 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :  +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
> 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   :     +- TungstenExchange hashpartitioning(gear#21,cyl#22,200), None
> 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             :        +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
> 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            :           +- Project [gear#17 AS gear#21,cyl#9 AS cyl#22]
> 9     :              +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[gear#17,cyl#9] 
> 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      +- ConvertToSafe
> 11                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
> 12                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 +- TungstenExchange hashpartitioning(vs#23,200), None
> 13                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
> 14                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           +- Project [vs#15 AS vs#23]
> 15                            +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[vs#15]
> {code}
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org