You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2018/08/06 18:09:52 UTC

[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

GitHub user jiangxb1987 opened a pull request:

    https://github.com/apache/spark/pull/22011

    [WIP][SPARK-24822][PySpark] Python support for barrier execution mode

    ## What changes were proposed in this pull request?
    
    This PR add python support for barrier execution mode, thus enable launch a job containing barrier stage(s) from PySpark.
    
    We just forked the existing `RDDBarrier` and `RDD.barrier()` in Java and Python api here.
    
    ## How was this patch tested?
    
    TBD

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiangxb1987/spark python

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22011
    
----
commit ec2f66851b47ab885608cb7caa277eeb865ab0d2
Author: Xingbo Jiang <xi...@...>
Date:   2018-08-06T18:02:29Z

    init.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94600/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    @jiangxb1987 Please mention that tests will be added in a follow-up PR that implements BarrierTaskContext.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2020/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94514/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94302/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94600/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94590/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208068705
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    +    contains this RDD together.
    +    """
    +
    +    def __init__(self, rdd):
    +        self.rdd = rdd
    +        self._jrdd = rdd._jrdd
    +
    +    def mapPartitions(self, f, preservesPartitioning=False):
    --- End diff --
    
    If we expose a package private method to get the annotated RDD with `isBarrier=True` in `RDDBarrier`, we can implement `mapPartitions` easily here:
    
    ~~~python
    jBarrierRdd = self._jrdd.rdd.barrier().barrierRdd.javaRdd
    pyBarrierRdd = RDD(self._jrdd.rdd.barrier().barrierRdd.javaRdd)
    pyBarrierRdd.mapPartitions(f, preservesPartitioning)
    ~~~


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94549/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2062/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94530/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94308/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r209118962
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2449,36 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    +    contains this RDD together.
    +
    +    .. versionadded:: 2.4.0
    +    """
    +
    +    def __init__(self, rdd):
    +        self.rdd = rdd
    +        self._jrdd = rdd._jrdd
    +
    +    def mapPartitions(self, f, preservesPartitioning=False):
    +        """
    +        .. note:: Experimental
    +
    +        Return a new RDD by applying a function to each partition of this RDD.
    +
    +        .. versionadded:: 2.4.0
    +        """
    +        def func(s, iterator):
    +            return f(iterator)
    +        jBarrierRdd = self._jrdd.rdd().barrier().toJavaRDD()
    --- End diff --
    
    This will materialize the java RDD, which means the map functions before and after barrier will be executed by 2 python workers.
    
    We should not materialize the java RDD here, but just set a isBarrier flag in the pythhon `PipelinedRDD`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94308/testReport)** for PR 22011 at commit [`b0b2f86`](https://github.com/apache/spark/commit/b0b2f86cc5b19693ff1f46795b9266d1024cb85a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94512/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208092650
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    +    contains this RDD together.
    --- End diff --
    
    ditto let's add `.. versionadded:: 2.4.0` at the end.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2011/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94575/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94530/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94512/testReport)** for PR 22011 at commit [`1ee8025`](https://github.com/apache/spark/commit/1ee80254c869b9fe42d05f401a4802d8b4e1662a).
     * This patch **fails from timeout after a configured wait of \`300m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94565/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1860/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2009/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22011


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2035/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94575/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94590/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    LGTM, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208068968
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDBarrier.scala ---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.api.java
    +
    +import scala.reflect.ClassTag
    +
    +import org.apache.spark.BarrierTaskContext
    +import org.apache.spark.TaskContext
    +import org.apache.spark.annotation.{Experimental, Since}
    +import org.apache.spark.rdd.MapPartitionsRDD
    +
    +/**
    + * A Java-friendly version of [[org.apache.spark.rdd.RDDBarrier]] that returns
    + * [[org.apache.spark.api.java.JavaRDD]]s.
    + *
    + * An RDD barrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    + * contains this RDD together.
    + */
    +class JavaRDDBarrier[T: ClassTag](javaRdd: JavaRDD[T]) {
    --- End diff --
    
    This is not necessary to implement Python support.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2044/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94512/testReport)** for PR 22011 at commit [`1ee8025`](https://github.com/apache/spark/commit/1ee80254c869b9fe42d05f401a4802d8b4e1662a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1866/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208093240
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    --- End diff --
    
    nit: `RDDBarrier` -> `RDD barrier`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94600/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2052/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94302/testReport)** for PR 22011 at commit [`ec2f668`](https://github.com/apache/spark/commit/ec2f66851b47ab885608cb7caa277eeb865ab0d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94590/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94575/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94514/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
     * This patch **fails from timeout after a configured wait of \`300m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94302/testReport)** for PR 22011 at commit [`ec2f668`](https://github.com/apache/spark/commit/ec2f66851b47ab885608cb7caa277eeb865ab0d2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class JavaRDDBarrier[T: ClassTag](javaRdd: JavaRDD[T]) `
      * `class RDDBarrier(object):`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2070/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94514/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r209289539
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2406,6 +2406,26 @@ def toLocalIterator(self):
                 sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
             return _load_from_socket(sock_info, self._jrdd_deserializer)
     
    +    def barrier(self):
    +        """
    +        .. note:: Experimental
    +
    +        Indicates that Spark must launch the tasks together for the current stage.
    +
    +        .. versionadded:: 2.4.0
    +        """
    +        return RDDBarrier(self)
    +
    +    def isBarrier(self):
    --- End diff --
    
    In scala RDD there is a `private[spark]` `isBarrier()` function, we don't add this to JavaRDD


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    **[Test build #94308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94308/testReport)** for PR 22011 at commit [`b0b2f86`](https://github.com/apache/spark/commit/b0b2f86cc5b19693ff1f46795b9266d1024cb85a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208122669
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    +    contains this RDD together.
    +    """
    +
    +    def __init__(self, rdd):
    +        self.rdd = rdd
    +        self._jrdd = rdd._jrdd
    +
    +    def mapPartitions(self, f, preservesPartitioning=False):
    --- End diff --
    
    docstring?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94530/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208093660
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
                                       sc.pythonVer, broadcast_vars, sc._javaAccumulator)
     
     
    +class RDDBarrier(object):
    +
    +    """
    +    .. note:: Experimental
    +
    +    An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
    +    contains this RDD together.
    +    """
    +
    +    def __init__(self, rdd):
    +        self.rdd = rdd
    +        self._jrdd = rdd._jrdd
    +
    +    def mapPartitions(self, f, preservesPartitioning=False):
    +        """
    +        Return a new RDD by applying a function to each partition of this RDD.
    +        """
    --- End diff --
    
    shall we match the documentation, or why is it different?
    
    FWIW, for coding block, just `` `blabla` `` should be good enough. Nicer if linked properly by like `` :class:`ClassName` ``.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r209287104
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2406,6 +2406,26 @@ def toLocalIterator(self):
                 sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
             return _load_from_socket(sock_info, self._jrdd_deserializer)
     
    +    def barrier(self):
    +        """
    +        .. note:: Experimental
    +
    +        Indicates that Spark must launch the tasks together for the current stage.
    +
    +        .. versionadded:: 2.4.0
    +        """
    +        return RDDBarrier(self)
    +
    +    def isBarrier(self):
    --- End diff --
    
    do we have this API in the JVM RDD?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22011
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22011#discussion_r208092333
  
    --- Diff: python/pyspark/rdd.py ---
    @@ -2406,6 +2406,18 @@ def toLocalIterator(self):
                 sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
             return _load_from_socket(sock_info, self._jrdd_deserializer)
     
    +    def barrier(self):
    --- End diff --
    
    I don't know why we didn't mark the version so far here but we really should `.. versionadded:: 2.4.0` here or 
    
    ```
    @since(2.4)
    def barrier(self):
        ...
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org