You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2017/02/02 23:26:38 UTC

[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/16782

    [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorator is not thread-safe

    ## What changes were proposed in this pull request?
    The `@keyword_only` decorator in PySpark is not thread-safe.  It writes kwargs to a static class variable in the decorator, which is then retrieved later in the class method as `_input_kwargs`.  If multiple threads are constructing the same class with different kwargs, it becomes a race condition to read from the static class variable before it's overwritten.  See [SPARK-19348](https://issues.apache.org/jira/browse/SPARK-19348) for reproduction code.
    
    This change will write the kwargs to a member variable so that multiple threads can operate on separate instances without the race condition.  It does not protect against multiple threads operating on a single instance, but that is better left to the user to synchronize.
    
    ## How was this patch tested?
    WIP


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark pyspark-keyword_only-threadsafe-SPARK-19348

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16782.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16782
    
----
commit a15e9b0a78d86685d6b27ac4fdfa37161376b288
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-02-02T23:01:45Z

    made a thread-safe addition to keyword_only decorator

commit 83bcce0c2fc33184579224fca18066dbf6891d26
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-02-02T23:11:32Z

    remove blank line

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73709 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73709/testReport)** for PR 16782 at commit [`8dafc20`](https://github.com/apache/spark/commit/8dafc20fd2bbbe9678fa44f7216982fdd0955c14).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds the following public classes _(experimental)_:
      * `class KeywordOnlyTests(unittest.TestCase):`
      * `    class Wrapped(object):`
      * `        class Setter(object):`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73709/testReport)** for PR 16782 at commit [`8dafc20`](https://github.com/apache/spark/commit/8dafc20fd2bbbe9678fa44f7216982fdd0955c14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Thanks @jkbradley and @davies for reviewing.  This fix still seems a little hacky to me and you could still possibly run into trouble if you call a nested wrapped function and don't consume the `_input_kwargs` right away.  But it is the best solution I could think of without being overly complicated and it is a little better than it was before.  If you guys give the go ahead, I can update the other uses in pyspark.ml and try to add a test also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by avi8tr <gi...@git.apache.org>.
Github user avi8tr commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    This patch is not a solution for pyspark users because all of the ML stages in the pipeline are also not threadsafe in their creation due to this same wrapper.  Note that the wrapper does two separate things, enforces keywords only and passes the kwargs in an unsafe manner outside the call to the wrapped method.  We can fix this by simply omitting the wrapper's second (apparently unneeded) feature.  Another benefit of this omission is that wrapped functions do not need to be modified to use the wrapper (although the ML methods that have been already modified to depend upon the input_kwargs introduced by the defective wrapper must be switched back to using named arguments).  Note this also would fix the bug in Pipeline where the __init__ method's modifications to stages are lost.  To illustrate this approach to a fix using minimalist code similar to Pipeline:
    
    `from functools import wraps
    
    def keyword_only(func):
        """
        A decorator that forces keyword arguments in the wrapped method
        """
        @wraps(func)
        def wrapper(*args, **kwargs):
            if len(args) > 1:
                raise TypeError("Method %s forces keyword arguments." % func.__name__)
            return func(*args, **kwargs)
        return wrapper
    
    class Mytest:
    
        @keyword_only
        def __init__(self, stages=None):
            """
            __init__(self, stages=None)
            """
            self.setParams(stages=stages)
    
        @keyword_only
        def setParams(self, stages=None):
            """
            setParams(self, stages=None)
            Sets params for Pipeline.
            """
            if stages is None:
                stages = []
            return self._set(stages=stages)
    
        def _set(self,**kwargs):
            for key,value in kwargs.items():
                print ('kwargs contains ' + key + ": " + str(value))
    
    
    if __name__ == "__main__":
        print ()
        print ('zero arguments')
        baz = Mytest()
        print ()
        print ('initParams')
        foo = Mytest(stages='initParams')
        print ()
        print ('setParams')
        bar = Mytest()
        bar.setParams(stages='setParams')
        print ()
        print ('nonKeyword arguments')
        try:
            bar = Mytest('nokeywords')
        except Exception as e:
            print ('Exception: '+e.args[0])
            
        print ()
        print ('initParams with unexpected parameter')
        try:
            bat = Mytest(stages='initParams', unexpectedParameter='foo')
        except Exception as e:
            print ('Exception: '+e.args[0])
    `
    the output of which is:
    `zero arguments
    kwargs contains stages: []
    
    initParams
    kwargs contains stages: initParams
    
    setParams
    kwargs contains stages: []
    kwargs contains stages: setParams
    
    nonKeyword arguments
    Exception: Method __init__ forces keyword arguments.
    
    initParams with unexpected parameter
    Exception: __init__() got an unexpected keyword argument 'unexpectedParameter'
    `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73713/testReport)** for PR 16782 at commit [`e578320`](https://github.com/apache/spark/commit/e5783209dff55a6010ca17da819542f7a1cdb12c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    > it leaves in place the static class variable for all other ML classes that use the wrapper, and those classes continue to use the static class variable. 
    
    I think this was discussed above: This WIP PR currently just changes the usage for Pipeline, but if the fix is OK for Pipeline, then @BryanCutler can update it for all models.
    
    Given the OK from @davies I recommend we proceed with the current fix (but using 'self' to hold the kwargs as mentioned above).  With regards to using ```inspection```, I say we just add a note to the keyword_only wrapper about only using it for methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Clever unit test : )
    
    LGTM
    Merging with master
    
    I'll try to backport it to branch-2.1 and branch-2.0 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Thanks @BryanCutler for the patch!  The fix looks reasonable to me, but let me try to check with @davies to confirm.
    
    If this is the right approach, then I think we should update the other uses of _input_kwargs in pyspark.ml as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    I think this is ready for a final review @jkbradley @davies - thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #72292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72292/testReport)** for PR 16782 at commit [`83bcce0`](https://github.com/apache/spark/commit/83bcce0c2fc33184579224fca18066dbf6891d26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73713/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #3586 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3586/testReport)** for PR 16782 at commit [`83bcce0`](https://github.com/apache/spark/commit/83bcce0c2fc33184579224fca18066dbf6891d26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #3586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3586/testReport)** for PR 16782 at commit [`83bcce0`](https://github.com/apache/spark/commit/83bcce0c2fc33184579224fca18066dbf6891d26).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    @jkbradley I think that last test comment is from an older test that just took a while to finish, Test build #73713 is from the last commit and passed, but I can rerun just in case if you like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #72292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72292/testReport)** for PR 16782 at commit [`83bcce0`](https://github.com/apache/spark/commit/83bcce0c2fc33184579224fca18066dbf6891d26).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    also, using the `inspection` module it would be possible to check if the wrapped function is a method.  Then we wouldn't need to just make that assumption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73713/testReport)** for PR 16782 at commit [`e578320`](https://github.com/apache/spark/commit/e5783209dff55a6010ca17da819542f7a1cdb12c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73710/testReport)** for PR 16782 at commit [`acf9cf4`](https://github.com/apache/spark/commit/acf9cf475fbc37d1a6b9de4fa5bdec02a447dc43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73709/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by avi8tr <gi...@git.apache.org>.
Github user avi8tr commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Hi, thanks for explaining that there is a purpose for the retention and passing of the user-supplied arguments outside of the function call (while not changing the public api).  This fix enabling storage per-instance fits the usage model for threading in Spark -- one thread creates the pipeline and e.g. invokes .fit() -- but it stops short of a fix because it leaves in place the static class variable for all other ML classes that use the wrapper, and those classes continue to use the static class variable. That is the aspect of the patch that is not thread-safe.  If this branch is merged, one still cannot reasonably create multiple ML pipelines in a threaded environment because the elements of the pipeline (its stages) are now known to be subject to the same bug.  (The remaining nit is, what is supposed to happen to arguments, e.g. stages=, that are changed in the bodies of the wrapped methods?  Currently, the changes are thrown away.  This would seem to deserve at least a commen
 t placed in the dead code.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    You're right about the test.  I'll take a look now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16782#discussion_r103277884
  
    --- Diff: python/pyspark/__init__.py ---
    @@ -96,9 +96,11 @@ def keyword_only(func):
         """
         @wraps(func)
         def wrapper(*args, **kwargs):
    +        # NOTE - this assumes we are wrapping a method and args[0] will be 'self'
             if len(args) > 1:
                 raise TypeError("Method %s forces keyword arguments." % func.__name__)
             wrapper._input_kwargs = kwargs
    --- End diff --
    
    Yeah, that is what I was suggesting only that removing that would require changing everywhere it is used in ml.  So I just wanted to check with you guys first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    I'm OK with the current solution, though if it's easy to check using ```inspection``` then that seems nice to do.
    
    If there are cases in which the wrapper is still not thread-safe, then could you please document that in the wrapper?  I worry about other parts of Spark adopting keyword_only without recognizing the thread safety issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72292/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73710/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Sure, I can do a backport @jkbradley, will ping you when ready
    
    On Mar 3, 2017 4:46 PM, "asfgit" <no...@github.com> wrote:
    
    > Closed #16782 <https://github.com/apache/spark/pull/16782> via 44281ca
    > <https://github.com/apache/spark/commit/44281ca81d4eda02b627ba21841108438b7d1c27>
    > .
    >
    > \u2014
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/16782#event-986271013>, or mute the
    > thread
    > <https://github.com/notifications/unsubscribe-auth/AEUwdTstyjGgeiL0DCxl-jVNy8bMHDjiks5riLRvgaJpZM4L1xbg>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    **[Test build #73710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73710/testReport)** for PR 16782 at commit [`acf9cf4`](https://github.com/apache/spark/commit/acf9cf475fbc37d1a6b9de4fa5bdec02a447dc43).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only d...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16782#discussion_r103276349
  
    --- Diff: python/pyspark/__init__.py ---
    @@ -96,9 +96,11 @@ def keyword_only(func):
         """
         @wraps(func)
         def wrapper(*args, **kwargs):
    +        # NOTE - this assumes we are wrapping a method and args[0] will be 'self'
             if len(args) > 1:
                 raise TypeError("Method %s forces keyword arguments." % func.__name__)
             wrapper._input_kwargs = kwargs
    --- End diff --
    
    If the assumption is correct, should we always use 'self' to hold the kwargs? (remove this line and update all the fuctions that use `keyword_only`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Ping @holdenk @davies .  I reproduced the code in the JIRA and found that kwargs from one thread were getting overwritten by another, causing a `ml.Pipeline` to be constructed with incorrect parameters.  This is just a bit of a hacked solution, not sure what you would think about possibly removing `wrapper._input_kwargs = kwargs` for this instead, because it would require a lot of changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16782: [SPARK-19348][PYTHON] PySpark keyword_only decora...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16782


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON] PySpark keyword_only decorator is ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Well, it merged with master, but it will need some manual backports.  @BryanCutler Would you mind sending one for branch-2.1?  I'm ambivalent about 2.0; your call (or anyone who's hit this on 2.0).
    
    Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    That's correct @jkbradley , thanks for clearing that up - I should have been more clear in the description.  I'll go ahead and remove the static `_input_kwargs` and update the remaining uses.  I doesn't look like `inspection` is in Python 2.6 anyway so I'll put a note in the docstring instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16782: [SPARK-19348][PYTHON][WIP] PySpark keyword_only decorato...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/16782
  
    Hi @avi8tr , what exactly about this proposed fix is not thread-safe?  `_input_kwargs` also performs another function which is to only contain the params explicitly set by the user.  These get passed along separately with a complete set of default values to the underlying pyspark.ml param maps.  What you are proposing does not account for that and would require major restructuring as many of the pyspark.ml classes have huge param lists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org