You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/08/06 16:30:00 UTC
[jira] [Assigned] (SPARK-24928) spark sql cross join running time
too long
[ https://issues.apache.org/jira/browse/SPARK-24928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-24928:
------------------------------------
Assignee: Apache Spark
> spark sql cross join running time too long
> ------------------------------------------
>
> Key: SPARK-24928
> URL: https://issues.apache.org/jira/browse/SPARK-24928
> Project: Spark
> Issue Type: Bug
> Components: Optimizer
> Affects Versions: 1.6.2
> Reporter: LIFULONG
> Assignee: Apache Spark
> Priority: Minor
>
> spark sql running time is too long while input left table and right table is small hdfs text format data,
> the sql is: select * from t1 cross join t2
> the line of t1 is 499999, three column
> the line of t2 is 1, one column only
> running more than 30mins and then failed
>
>
> spark CartesianRDD also has the same problem, example test code is:
> val ones = sc.textFile("hdfs://host:port/data/cartesian_data/t1b") //1 line 1 column
> val twos = sc.textFile("hdfs://host:port/data/cartesian_data/t2b") //499999 line 3 column
> val cartesian = new CartesianRDD(sc, twos, ones)
> cartesian.count()
> running more than 5 mins,while use CartesianRDD(sc, ones, twos) , it only use less than 10 seconds
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org