You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/03/26 13:59:41 UTC
[jira] [Resolved] (SPARK-13169) CROSS JOIN slow or fails on tiny
table
[ https://issues.apache.org/jira/browse/SPARK-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-13169.
----------------------------------
Resolution: Cannot Reproduce
I am resolving this. I can't reproduce this against the current master as below:
{code}
val sql = """
SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
SELECT DISTINCT * FROM (
SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`)
AS `zzz1`)
AS `gwtlyrpywf`
CROSS JOIN (
SELECT DISTINCT * FROM (
SELECT `vs` AS `vs` FROM `mtcars`)
AS `zzz3`)
AS `arytvfispy`
"""
spark.read.option("header", true).option("inferSchema", true).csv("mtcars.csv").createOrReplaceTempView("mtcars")
spark.sql(sql).show()
{code}
{code}
+----+---+---+
|gear|cyl| vs|
+----+---+---+
| 5| 6| 1|
| 5| 6| 0|
| 5| 4| 1|
| 5| 4| 0|
| 4| 6| 1|
| 4| 6| 0|
| 3| 6| 1|
| 3| 6| 0|
| 5| 8| 1|
| 5| 8| 0|
| 3| 8| 1|
| 3| 8| 0|
| 3| 4| 1|
| 3| 4| 0|
| 4| 4| 1|
| 4| 4| 0|
+----+---+---+
{code}
This seems fixed in the master. It would be great if someone identifies the JIRA and backports this if applicable.
> CROSS JOIN slow or fails on tiny table
> --------------------------------------
>
> Key: SPARK-13169
> URL: https://issues.apache.org/jira/browse/SPARK-13169
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: Antonio Piccolboni
>
> I am running a cross join with a distinct select on both sides. Table is tiny (32 X 16). Running query through the thriftserver. Data is here (https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv). Query never terminates before 200s, mostly fails (TTransportException) while all cores are being used (single machine).
> Query is
> {code}
> SELECT `gwtlyrpywf`.`gear`,`gwtlyrpywf`.`cyl`,`vs` FROM (
> SELECT DISTINCT * FROM (
> SELECT `gear` AS `gear`, `cyl` AS `cyl`FROM `mtcars`)
> AS `zzz1`)
> AS `gwtlyrpywf`
> CROSS JOIN (
> SELECT DISTINCT * FROM (
> SELECT `vs` AS `vs` FROM `mtcars`)
> AS `zzz3`)
> AS `arytvfispy`
> {code}
> I know it can be simplified, but it comes from a generator and the generator counts on the optimizer to do the right thing. EXPLAIN shows the following
> {code}
> plan
> 1 == Physical Plan ==
> 2 Project [gear#21,cyl#22,vs#23]
> 3 +- CartesianProduct
> 4 :- ConvertToSafe
> 5 : +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
> 6 : +- TungstenExchange hashpartitioning(gear#21,cyl#22,200), None
> 7 : +- TungstenAggregate(key=[gear#21,cyl#22], functions=[], output=[gear#21,cyl#22])
> 8 : +- Project [gear#17 AS gear#21,cyl#9 AS cyl#22]
> 9 : +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[gear#17,cyl#9]
> 10 +- ConvertToSafe
> 11 +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
> 12 +- TungstenExchange hashpartitioning(vs#23,200), None
> 13 +- TungstenAggregate(key=[vs#23], functions=[], output=[vs#23])
> 14 +- Project [vs#15 AS vs#23]
> 15 +- Scan CsvRelation(<function0>,Some(/var/folders/_p/1gx4vy311_x4syn2xq6f2xtc0000gr/T//RtmpeDwNvS/file168c154ef10e),true,,,",null,#,FAILFAST,commons,false,false,false,StructType(StructField(mpg,DoubleType,true), StructField(cyl,DoubleType,true), StructField(disp,DoubleType,true), StructField(hp,DoubleType,true), StructField(drat,DoubleType,true), StructField(wt,DoubleType,true), StructField(qsec,DoubleType,true), StructField(vs,DoubleType,true), StructField(am,DoubleType,true), StructField(gear,DoubleType,true), StructField(carb,DoubleType,true)),true,null)[vs#15]
> {code}
> Thanks
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org