You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jiangxb1987 <gi...@git.apache.org> on 2018/08/06 18:09:52 UTC
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
GitHub user jiangxb1987 opened a pull request:
https://github.com/apache/spark/pull/22011
[WIP][SPARK-24822][PySpark] Python support for barrier execution mode
## What changes were proposed in this pull request?
This PR add python support for barrier execution mode, thus enable launch a job containing barrier stage(s) from PySpark.
We just forked the existing `RDDBarrier` and `RDD.barrier()` in Java and Python api here.
## How was this patch tested?
TBD
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiangxb1987/spark python
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22011.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22011
----
commit ec2f66851b47ab885608cb7caa277eeb865ab0d2
Author: Xingbo Jiang <xi...@...>
Date: 2018-08-06T18:02:29Z
init.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94600/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22011
@jiangxb1987 Please mention that tests will be added in a follow-up PR that implements BarrierTaskContext.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22011
test this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2020/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94514/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94302/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94600/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94590/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208068705
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ contains this RDD together.
+ """
+
+ def __init__(self, rdd):
+ self.rdd = rdd
+ self._jrdd = rdd._jrdd
+
+ def mapPartitions(self, f, preservesPartitioning=False):
--- End diff --
If we expose a package private method to get the annotated RDD with `isBarrier=True` in `RDDBarrier`, we can implement `mapPartitions` easily here:
~~~python
jBarrierRdd = self._jrdd.rdd.barrier().barrierRdd.javaRdd
pyBarrierRdd = RDD(self._jrdd.rdd.barrier().barrierRdd.javaRdd)
pyBarrierRdd.mapPartitions(f, preservesPartitioning)
~~~
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22011
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94549/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2062/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94530/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94308/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22011
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r209118962
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2449,36 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ contains this RDD together.
+
+ .. versionadded:: 2.4.0
+ """
+
+ def __init__(self, rdd):
+ self.rdd = rdd
+ self._jrdd = rdd._jrdd
+
+ def mapPartitions(self, f, preservesPartitioning=False):
+ """
+ .. note:: Experimental
+
+ Return a new RDD by applying a function to each partition of this RDD.
+
+ .. versionadded:: 2.4.0
+ """
+ def func(s, iterator):
+ return f(iterator)
+ jBarrierRdd = self._jrdd.rdd().barrier().toJavaRDD()
--- End diff --
This will materialize the java RDD, which means the map functions before and after barrier will be executed by 2 python workers.
We should not materialize the java RDD here, but just set a isBarrier flag in the pythhon `PipelinedRDD`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94308/testReport)** for PR 22011 at commit [`b0b2f86`](https://github.com/apache/spark/commit/b0b2f86cc5b19693ff1f46795b9266d1024cb85a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94512/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208092650
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ contains this RDD together.
--- End diff --
ditto let's add `.. versionadded:: 2.4.0` at the end.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2011/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94575/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94530/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94512/testReport)** for PR 22011 at commit [`1ee8025`](https://github.com/apache/spark/commit/1ee80254c869b9fe42d05f401a4802d8b4e1662a).
* This patch **fails from timeout after a configured wait of \`300m\`**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94565/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1860/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2009/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22011
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2035/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94549/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94575/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94590/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22011
LGTM, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208068968
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDBarrier.scala ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.java
+
+import scala.reflect.ClassTag
+
+import org.apache.spark.BarrierTaskContext
+import org.apache.spark.TaskContext
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.rdd.MapPartitionsRDD
+
+/**
+ * A Java-friendly version of [[org.apache.spark.rdd.RDDBarrier]] that returns
+ * [[org.apache.spark.api.java.JavaRDD]]s.
+ *
+ * An RDD barrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ * contains this RDD together.
+ */
+class JavaRDDBarrier[T: ClassTag](javaRdd: JavaRDD[T]) {
--- End diff --
This is not necessary to implement Python support.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2044/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94512/testReport)** for PR 22011 at commit [`1ee8025`](https://github.com/apache/spark/commit/1ee80254c869b9fe42d05f401a4802d8b4e1662a).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22011
test this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1866/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208093240
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
--- End diff --
nit: `RDDBarrier` -> `RDD barrier`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94600/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2052/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94302/testReport)** for PR 22011 at commit [`ec2f668`](https://github.com/apache/spark/commit/ec2f66851b47ab885608cb7caa277eeb865ab0d2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94590/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94575/testReport)** for PR 22011 at commit [`cf38531`](https://github.com/apache/spark/commit/cf3853177d0ed76efbffee8ced1021003b085a26).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94514/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
* This patch **fails from timeout after a configured wait of \`300m\`**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/22011
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94302 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94302/testReport)** for PR 22011 at commit [`ec2f668`](https://github.com/apache/spark/commit/ec2f66851b47ab885608cb7caa277eeb865ab0d2).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
* `class JavaRDDBarrier[T: ClassTag](javaRdd: JavaRDD[T]) `
* `class RDDBarrier(object):`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2070/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94514/testReport)** for PR 22011 at commit [`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...
Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r209289539
--- Diff: python/pyspark/rdd.py ---
@@ -2406,6 +2406,26 @@ def toLocalIterator(self):
sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
return _load_from_socket(sock_info, self._jrdd_deserializer)
+ def barrier(self):
+ """
+ .. note:: Experimental
+
+ Indicates that Spark must launch the tasks together for the current stage.
+
+ .. versionadded:: 2.4.0
+ """
+ return RDDBarrier(self)
+
+ def isBarrier(self):
--- End diff --
In scala RDD there is a `private[spark]` `isBarrier()` function, we don't add this to JavaRDD
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22011
**[Test build #94308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94308/testReport)** for PR 22011 at commit [`b0b2f86`](https://github.com/apache/spark/commit/b0b2f86cc5b19693ff1f46795b9266d1024cb85a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208122669
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ contains this RDD together.
+ """
+
+ def __init__(self, rdd):
+ self.rdd = rdd
+ self._jrdd = rdd._jrdd
+
+ def mapPartitions(self, f, preservesPartitioning=False):
--- End diff --
docstring?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94530/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208093660
--- Diff: python/pyspark/rdd.py ---
@@ -2429,6 +2441,29 @@ def _wrap_function(sc, func, deserializer, serializer, profiler=None):
sc.pythonVer, broadcast_vars, sc._javaAccumulator)
+class RDDBarrier(object):
+
+ """
+ .. note:: Experimental
+
+ An RDDBarrier turns an RDD into a barrier RDD, which forces Spark to launch tasks of the stage
+ contains this RDD together.
+ """
+
+ def __init__(self, rdd):
+ self.rdd = rdd
+ self._jrdd = rdd._jrdd
+
+ def mapPartitions(self, f, preservesPartitioning=False):
+ """
+ Return a new RDD by applying a function to each partition of this RDD.
+ """
--- End diff --
shall we match the documentation, or why is it different?
FWIW, for coding block, just `` `blabla` `` should be good enough. Nicer if linked properly by like `` :class:`ClassName` ``.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [SPARK-24822][PySpark] Python support for barrier...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r209287104
--- Diff: python/pyspark/rdd.py ---
@@ -2406,6 +2406,26 @@ def toLocalIterator(self):
sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
return _load_from_socket(sock_info, self._jrdd_deserializer)
+ def barrier(self):
+ """
+ .. note:: Experimental
+
+ Indicates that Spark must launch the tasks together for the current stage.
+
+ .. versionadded:: 2.4.0
+ """
+ return RDDBarrier(self)
+
+ def isBarrier(self):
--- End diff --
do we have this API in the JVM RDD?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22011: [WIP][SPARK-24822][PySpark] Python support for barrier e...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22011
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22011: [WIP][SPARK-24822][PySpark] Python support for ba...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22011#discussion_r208092333
--- Diff: python/pyspark/rdd.py ---
@@ -2406,6 +2406,18 @@ def toLocalIterator(self):
sock_info = self.ctx._jvm.PythonRDD.toLocalIteratorAndServe(self._jrdd.rdd())
return _load_from_socket(sock_info, self._jrdd_deserializer)
+ def barrier(self):
--- End diff --
I don't know why we didn't mark the version so far here but we really should `.. versionadded:: 2.4.0` here or
```
@since(2.4)
def barrier(self):
...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org