You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2018/11/07 10:05:42 UTC
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...
GitHub user xuanyuanking opened a pull request:
https://github.com/apache/spark/pull/22962
[SPARK-25921][PySpark] Fix BarrierTaskContext while python worker reuse
## What changes were proposed in this pull request?
While python worker reuse, BarrierTaskContext._getOrCreate() will still return a TaskContext, we'll get a `AttributeError: 'TaskContext' object has no attribute 'barrier'`. Fix this by adding check logic in BarrierTaskContext._getOrCreate() and rewrite `__new__` method for BarrierTaskContext.
## How was this patch tested?
Add new UT in pyspark-core.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xuanyuanking/spark SPARK-25921
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22962.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22962
----
commit 0cb2cf6e9ece66861073c31b579b595a9de5ce81
Author: Yuanjian Li <xy...@...>
Date: 2018-11-07T10:01:54Z
fix BarrierTaskContext while python worker reuse
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22962
Thanks @gatorsmile @HyukjinKwon @cloud-fan !
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233033846
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
ah good point! @xuanyuanking can you send a small followup?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233275410
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
```
could you add some comments to explain it?
```
@cloud-fan Sorry for the less explain and more comments should be done at first, will done in follow up PR.
```
Can we get rid of the rewrite all?
we should remove __init__ too
next time please fully describe what's going on in PR description
```
@HyukjinKwon Sorry for the less explain, all these will be done in next follow up PR, and the new UT.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22962
**[Test build #98544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98544/testReport)** for PR 22962 at commit [`0cb2cf6`](https://github.com/apache/spark/commit/0cb2cf6e9ece66861073c31b579b595a9de5ce81).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233283645
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
```
could you add some comments to explain it?
```
@cloud-fan Sorry for the less explain and more comments should be done at first, will done in follow up PR.
```
Can we get rid of the rewrite all?
we should remove __init__ too
next time please fully describe what's going on in PR description
```
@HyukjinKwon Sorry for the less explain, all these will be done in next follow up PR, and the new UT.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232634698
--- Diff: python/pyspark/tests.py ---
@@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
"""
Verify that BarrierTaskContext.barrier() with reused python worker.
"""
+ self.sc._conf.set("spark.python.work.reuse", "true")
--- End diff --
I do these 2 check like below:
1. Run this test case without fix in `BarrierTaskContext._getOrCreate`, the bug can be reproduced.
2. Same code running in pyspark shell and set `spark.python.work.resue=false`, it return successfully.
Maybe this can prove the UT can cover the issue and also can reuse the original barrier case code, WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22962
@HyukjinKwon Thanks for your review, comment address and PR description/title changed done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232527808
--- Diff: python/pyspark/tests.py ---
@@ -614,6 +614,18 @@ def context_barrier(x):
times = rdd.barrier().mapPartitions(f).map(context_barrier).collect()
self.assertTrue(max(times) - min(times) < 1)
+ def test_barrier_with_python_worker_reuse(self):
+ """
+ Verify that BarrierTaskContext.barrier() with reused python worker.
+ """
+ rdd = self.sc.parallelize(range(4), 4)
--- End diff --
Thanks, done in 02555b8.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98544/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22962
**[Test build #98544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98544/testReport)** for PR 22962 at commit [`0cb2cf6`](https://github.com/apache/spark/commit/0cb2cf6e9ece66861073c31b579b595a9de5ce81).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233130221
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
Can we get rid of the rewrite all? It's never a good idea to overwrite them unless it's absolutely required.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233117222
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
could you add some comments to explain it? so that people won't get confused again.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232990319
--- Diff: python/pyspark/tests.py ---
@@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
"""
Verify that BarrierTaskContext.barrier() with reused python worker.
"""
+ self.sc._conf.set("spark.python.work.reuse", "true")
--- End diff --
Yup. sorry for late response.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233055939
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
I think the answer is no, maybe I was not clear enough in my previous explain https://github.com/apache/spark/pull/22962#discussion_r232528333, use `BarrierTaskContext()` here is my first commit https://github.com/apache/spark/pull/22962/commits/0cb2cf6e9ece66861073c31b579b595a9de5ce81 , it should also need to rewrite `__new__` for `BarrierTaskContext`, otherwise the bug still exists cause its parent class `TaskContext` rewrite `__new__()`, when we call `BarrierTaskContext()` here in a reused worker, a `TaskContext` instance will be returned in `TaskContext.__new__()`:https://github.com/apache/spark/blob/c00e72f3d7530eb2ae43d4d45e8efde783daf6ff/python/pyspark/taskcontext.py#L47
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98714/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22962
cc @jiangxb1987 @MrBago
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22962
Please fix the PR title to describe what it fixes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22962
@HyukjinKwon No problem, I'll give a follow up PR to address all your comments and rewrite the UT in to a separate class.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22962
@xuanyuanking, mind explaining how and why it happens rather then what happens in PR description?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22962
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4940/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232528655
--- Diff: python/pyspark/tests.py ---
@@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
"""
Verify that BarrierTaskContext.barrier() with reused python worker.
"""
+ self.sc._conf.set("spark.python.work.reuse", "true")
--- End diff --
@xuanyuanking, this will probably need a separate suite case since it's also related with how we start the worker or not. You can make a new class, run a simple job to make sure workers are created and being resued, test it and stop.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233130494
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
Also, we should remove __init__ too. That's what Python interpretor will implicitly insert for both.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22962
Also please fix the test. The test doesn't really look clear. I actually quite didn't like the test written here now.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22962
The main code change LGTM too in any event
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232447967
--- Diff: python/pyspark/taskcontext.py ---
@@ -144,10 +144,19 @@ def __init__(self):
"""Construct a BarrierTaskContext, use get instead"""
pass
+ def __new__(cls):
+ """
+ Rewrite __new__ method to BarrierTaskContext for _getOrCreate called when _taskContext
+ is not instance of BarrierTaskContext.
+ """
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
+ return cls._taskContext
--- End diff --
Why should we rewrite `__new__`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:
https://github.com/apache/spark/pull/22962
cc @cloud-fan @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22962
Looks making sense to me in general.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22962
LGTM, merging to master/2.4!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r233131375
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
Also, next time please fully describe what's going on in PR description. Even I was confused about it and misread that `__new__` is actually being inherited.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4809/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232528333
--- Diff: python/pyspark/taskcontext.py ---
@@ -144,10 +144,19 @@ def __init__(self):
"""Construct a BarrierTaskContext, use get instead"""
pass
+ def __new__(cls):
--- End diff --
Yep, do this in `_getOrCreate` has same effect, this is an over consider of https://github.com/apache/spark/blob/aec0af4a952df2957e21d39d1e0546a36ab7ab86/python/pyspark/taskcontext.py#L44-L45
Deleted in 02555b8.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232448083
--- Diff: python/pyspark/taskcontext.py ---
@@ -144,10 +144,19 @@ def __init__(self):
"""Construct a BarrierTaskContext, use get instead"""
pass
+ def __new__(cls):
--- End diff --
Why should we rewrite `__new__`? Can't we do this in `_getOrCreate` as well?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232986340
--- Diff: python/pyspark/tests.py ---
@@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
"""
Verify that BarrierTaskContext.barrier() with reused python worker.
"""
+ self.sc._conf.set("spark.python.work.reuse", "true")
--- End diff --
@HyukjinKwon Hi Hyukjin if you still think this need a separate class I'll think about the method of checking worker reuse and give a follow up PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232991503
--- Diff: python/pyspark/taskcontext.py ---
@@ -147,8 +147,8 @@ def __init__(self):
@classmethod
def _getOrCreate(cls):
"""Internal function to get or create global BarrierTaskContext."""
- if cls._taskContext is None:
- cls._taskContext = BarrierTaskContext()
+ if not isinstance(cls._taskContext, BarrierTaskContext):
+ cls._taskContext = object.__new__(cls)
--- End diff --
BTW, I think we should just `BarrierTaskContext()`. Let's don't make it complicated next time.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22962
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22962
**[Test build #98714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98714/testReport)** for PR 22962 at commit [`02555b8`](https://github.com/apache/spark/commit/02555b8fbdf85c3f2b5a92420479c168e14b573c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22962
**[Test build #98714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98714/testReport)** for PR 22962 at commit [`02555b8`](https://github.com/apache/spark/commit/02555b8fbdf85c3f2b5a92420479c168e14b573c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22962#discussion_r232447862
--- Diff: python/pyspark/tests.py ---
@@ -614,6 +614,18 @@ def context_barrier(x):
times = rdd.barrier().mapPartitions(f).map(context_barrier).collect()
self.assertTrue(max(times) - min(times) < 1)
+ def test_barrier_with_python_worker_reuse(self):
+ """
+ Verify that BarrierTaskContext.barrier() with reused python worker.
+ """
+ rdd = self.sc.parallelize(range(4), 4)
--- End diff --
Let's explicitly set `spark.python.worker.reuse` or at least let's assert.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org