You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jiang Xingbo (JIRA)" <ji...@apache.org> on 2018/07/20 12:32:00 UTC

[jira] [Created] (SPARK-24874) Allow hybrid of both barrier tasks and regular tasks in a stage

Jiang Xingbo created SPARK-24874:
------------------------------------

             Summary: Allow hybrid of both barrier tasks and regular tasks in a stage
                 Key: SPARK-24874
                 URL: https://issues.apache.org/jira/browse/SPARK-24874
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.0.0
            Reporter: Jiang Xingbo


Currently we only allow barrier tasks in a barrier stage, however, consider the following query:
{code}
sc = new SparkContext(conf)
val rdd1 = sc.parallelize(1 to 100, 10)
val rdd2 = sc.parallelize(1 to 1000, 20).barrier().mapPartitions((it, ctx) => it)
val rdd = rdd1.union(rdd2).mapPartitions(t => t)
{code}

Now it requires 30 free slots to run `rdd.collect()`. Actually, we can launch regular tasks to collect data from rdd1's partitions, they are not required to be launched together. If we can do that, we only need 20 free slots to run `rdd.collect()`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org