You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/08/06 16:30:00 UTC

[jira] [Assigned] (SPARK-24928) spark sql cross join running time too long

     [ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-24928:
------------------------------------

    Assignee: Apache Spark

> spark sql cross join running time too long
> ------------------------------------------
>
>                 Key: SPARK-24928
>                 URL: https://issues.apache.org/jira/browse/SPARK-24928
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 1.6.2
>            Reporter: LIFULONG
>            Assignee: Apache Spark
>            Priority: Minor
>
> spark sql running time is too long while input left table and right table is small hdfs text format data,
> the sql is:  select * from t1 cross join t2  
> the line of t1 is 499999, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>  
>  
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b")  //1 line 1 column
>  val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b")  //499999 line 3 column
>  val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org