You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "LIFULONG (JIRA)" <ji...@apache.org> on 2018/07/26 08:30:00 UTC

[jira] [Updated] (SPARK-24928) spark sql cross join running time too long

     [ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

LIFULONG updated SPARK-24928:
-----------------------------
    Priority: Minor  (was: Major)

> spark sql cross join running time too long
> ------------------------------------------
>
>                 Key: SPARK-24928
>                 URL: https://issues.apache.org/jira/browse/SPARK-24928
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 1.6.2
>            Reporter: LIFULONG
>            Priority: Minor
>
> spark sql running time is too long while input left table and right table is small text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 499999, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t1b")  //1 line 1 column
>  val twos = sc.textFile("file:///Users/moses/4paradigm/data/cartesian_data/t2b")  //499999 line 3 column
> val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org