You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by xuanyuanking <gi...@git.apache.org> on 2018/11/07 10:05:42 UTC

[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...

GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/22962

    [SPARK-25921][PySpark] Fix BarrierTaskContext while python worker reuse

    ## What changes were proposed in this pull request?
    
    While python worker reuse, BarrierTaskContext._getOrCreate() will still return a TaskContext, we'll get a `AttributeError: 'TaskContext' object has no attribute 'barrier'`. Fix this by adding check logic in BarrierTaskContext._getOrCreate() and rewrite `__new__` method for BarrierTaskContext.
    
    ## How was this patch tested?
    
    Add new UT in pyspark-core.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-25921

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22962
    
----
commit 0cb2cf6e9ece66861073c31b579b595a9de5ce81
Author: Yuanjian Li <xy...@...>
Date:   2018-11-07T10:01:54Z

    fix BarrierTaskContext while python worker reuse

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Thanks @gatorsmile  @HyukjinKwon @cloud-fan !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233033846
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    ah good point! @xuanyuanking can you send a small followup?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233275410
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    ```
    could you add some comments to explain it?
    ```
    @cloud-fan Sorry for the less explain and more comments should be done at first, will done in follow up PR.
    ```
    Can we get rid of the rewrite all?
    we should remove __init__ too
    next time please fully describe what's going on in PR description
    ```
    @HyukjinKwon Sorry for the less explain, all these will be done in next follow up PR, and the new UT.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    **[Test build #98544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98544/testReport)** for PR 22962 at commit [`0cb2cf6`](https://github.com/apache/spark/commit/0cb2cf6e9ece66861073c31b579b595a9de5ce81).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233283645
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    ```
    could you add some comments to explain it?
    ```
    @cloud-fan Sorry for the less explain and more comments should be done at first, will done in follow up PR.
    ```
    Can we get rid of the rewrite all?
    we should remove __init__ too
    next time please fully describe what's going on in PR description
    ```
    @HyukjinKwon Sorry for the less explain, all these will be done in next follow up PR, and the new UT.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232634698
  
    --- Diff: python/pyspark/tests.py ---
    @@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
             """
             Verify that BarrierTaskContext.barrier() with reused python worker.
             """
    +        self.sc._conf.set("spark.python.work.reuse", "true")
    --- End diff --
    
    I do these 2 check like below:
    1. Run this test case without fix in `BarrierTaskContext._getOrCreate`, the bug can be reproduced.
    2. Same code running in pyspark shell and set `spark.python.work.resue=false`, it return successfully.
    Maybe this can prove the UT can cover the issue and also can reuse the original barrier case code, WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    @HyukjinKwon Thanks for your review, comment address and PR description/title changed done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232527808
  
    --- Diff: python/pyspark/tests.py ---
    @@ -614,6 +614,18 @@ def context_barrier(x):
             times = rdd.barrier().mapPartitions(f).map(context_barrier).collect()
             self.assertTrue(max(times) - min(times) < 1)
     
    +    def test_barrier_with_python_worker_reuse(self):
    +        """
    +        Verify that BarrierTaskContext.barrier() with reused python worker.
    +        """
    +        rdd = self.sc.parallelize(range(4), 4)
    --- End diff --
    
    Thanks, done in 02555b8.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98544/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    **[Test build #98544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98544/testReport)** for PR 22962 at commit [`0cb2cf6`](https://github.com/apache/spark/commit/0cb2cf6e9ece66861073c31b579b595a9de5ce81).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233130221
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    Can we get rid of the rewrite all? It's never a good idea to overwrite them unless it's absolutely required.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233117222
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    could you add some comments to explain it? so that people won't get confused again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232990319
  
    --- Diff: python/pyspark/tests.py ---
    @@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
             """
             Verify that BarrierTaskContext.barrier() with reused python worker.
             """
    +        self.sc._conf.set("spark.python.work.reuse", "true")
    --- End diff --
    
    Yup. sorry for late response.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233055939
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    I think the answer is no, maybe I was not clear enough in my previous explain  https://github.com/apache/spark/pull/22962#discussion_r232528333, use `BarrierTaskContext()` here is my first commit https://github.com/apache/spark/pull/22962/commits/0cb2cf6e9ece66861073c31b579b595a9de5ce81 , it should also need to rewrite `__new__` for `BarrierTaskContext`, otherwise the bug still exists cause its parent class `TaskContext` rewrite `__new__()`, when we call `BarrierTaskContext()` here in a reused worker, a `TaskContext` instance will be returned in `TaskContext.__new__()`:https://github.com/apache/spark/blob/c00e72f3d7530eb2ae43d4d45e8efde783daf6ff/python/pyspark/taskcontext.py#L47



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98714/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    cc @jiangxb1987 @MrBago 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Please fix the PR title to describe what it fixes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    @HyukjinKwon No problem, I'll give a follow up PR to address all your comments and rewrite the UT in to a separate class.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    @xuanyuanking, mind explaining how and why it happens rather then what happens in PR description?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22962


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4940/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232528655
  
    --- Diff: python/pyspark/tests.py ---
    @@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
             """
             Verify that BarrierTaskContext.barrier() with reused python worker.
             """
    +        self.sc._conf.set("spark.python.work.reuse", "true")
    --- End diff --
    
    @xuanyuanking, this will probably need a separate suite case since it's also related with how we start the worker or not. You can make a new class, run a simple job to make sure workers are created and being resued, test it and stop.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233130494
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    Also, we should remove __init__ too. That's what Python interpretor will implicitly insert for both.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Also please fix the test. The test doesn't really look clear. I actually quite didn't like the test written here now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    The main code change LGTM too in any event


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232447967
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -144,10 +144,19 @@ def __init__(self):
             """Construct a BarrierTaskContext, use get instead"""
             pass
     
    +    def __new__(cls):
    +        """
    +        Rewrite __new__ method to BarrierTaskContext for _getOrCreate called when _taskContext
    +        is not instance of BarrierTaskContext.
    +        """
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    +        return cls._taskContext
    --- End diff --
    
    Why should we rewrite `__new__`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    cc @cloud-fan @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Looks making sense to me in general.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    LGTM, merging to master/2.4!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r233131375
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    Also, next time please fully describe what's going on in PR description. Even I was confused about it and misread that `__new__` is actually being inherited.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4809/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232528333
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -144,10 +144,19 @@ def __init__(self):
             """Construct a BarrierTaskContext, use get instead"""
             pass
     
    +    def __new__(cls):
    --- End diff --
    
    Yep, do this in `_getOrCreate` has same effect, this is an over consider of https://github.com/apache/spark/blob/aec0af4a952df2957e21d39d1e0546a36ab7ab86/python/pyspark/taskcontext.py#L44-L45
    Deleted in 02555b8.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232448083
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -144,10 +144,19 @@ def __init__(self):
             """Construct a BarrierTaskContext, use get instead"""
             pass
     
    +    def __new__(cls):
    --- End diff --
    
    Why should we rewrite `__new__`? Can't we do this in `_getOrCreate` as well?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232986340
  
    --- Diff: python/pyspark/tests.py ---
    @@ -618,10 +618,13 @@ def test_barrier_with_python_worker_reuse(self):
             """
             Verify that BarrierTaskContext.barrier() with reused python worker.
             """
    +        self.sc._conf.set("spark.python.work.reuse", "true")
    --- End diff --
    
    @HyukjinKwon Hi Hyukjin if you still think this need a separate class I'll think about the method of checking worker reuse and give a follow up PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix barrier task run witho...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232991503
  
    --- Diff: python/pyspark/taskcontext.py ---
    @@ -147,8 +147,8 @@ def __init__(self):
         @classmethod
         def _getOrCreate(cls):
             """Internal function to get or create global BarrierTaskContext."""
    -        if cls._taskContext is None:
    -            cls._taskContext = BarrierTaskContext()
    +        if not isinstance(cls._taskContext, BarrierTaskContext):
    +            cls._taskContext = object.__new__(cls)
    --- End diff --
    
    BTW, I think we should just `BarrierTaskContext()`. Let's don't make it complicated next time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix barrier task run without Barr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    **[Test build #98714 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98714/testReport)** for PR 22962 at commit [`02555b8`](https://github.com/apache/spark/commit/02555b8fbdf85c3f2b5a92420479c168e14b573c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext while pyth...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22962
  
    **[Test build #98714 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98714/testReport)** for PR 22962 at commit [`02555b8`](https://github.com/apache/spark/commit/02555b8fbdf85c3f2b5a92420479c168e14b573c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22962: [SPARK-25921][PySpark] Fix BarrierTaskContext whi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22962#discussion_r232447862
  
    --- Diff: python/pyspark/tests.py ---
    @@ -614,6 +614,18 @@ def context_barrier(x):
             times = rdd.barrier().mapPartitions(f).map(context_barrier).collect()
             self.assertTrue(max(times) - min(times) < 1)
     
    +    def test_barrier_with_python_worker_reuse(self):
    +        """
    +        Verify that BarrierTaskContext.barrier() with reused python worker.
    +        """
    +        rdd = self.sc.parallelize(range(4), 4)
    --- End diff --
    
    Let's explicitly set `spark.python.worker.reuse` or at least let's assert.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org