You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by holdenk <gi...@git.apache.org> on 2017/07/26 00:09:20 UTC

[GitHub] spark pull request #18734: [WIP][SPARK-21070][PYSPARK] Attempt to update clo...

GitHub user holdenk opened a pull request:

    https://github.com/apache/spark/pull/18734

    [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickle again

    ## What changes were proposed in this pull request?
    
    Based on https://github.com/apache/spark/pull/18282 by @rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle.
    
    Some notable changes:
    * Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
    * Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
    * Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
    * Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
    * Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
    * Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
    * Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
    * Support method descriptor (cloudpipe/cloudpickle#46)
    * No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
    * ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes.
    * Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
    * BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102)
    
    
    ## How was this patch tested?
    
    Existing PySpark unit tests + the unit tests from the cloudpickle project on their own.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/holdenk/spark holden-rgbkrk-cloudpickle-upgrades

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18734
    
----
commit b84222b6ea660ce5ec1aedfee50e297b436eb824
Author: Kyle Kelley <rg...@gmail.com>
Date:   2017-06-12T22:19:29Z

    [SPARK-21070][PYSPARK] Upgrade cloudpickle
    
    This brings in fixes and upgrades from the [cloudpickle](https://github.com/cloudpipe/cloudpickle) module, notably:
    
    * Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    * Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    * Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    * Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    * Make cloudpickle Python 3.6 compatible (https://github.com/cloudpipe/cloudpickle/pull/72)
    * Allow pickling of builtin methods (https://github.com/cloudpipe/cloudpickle/pull/57)
    * Add ability to pickle dynamically created modules (https://github.com/cloudpipe/cloudpickle/pull/52)
    * Support method descriptor (https://github.com/cloudpipe/cloudpickle/pull/46)
    * No more pickling of closed files, was broken on Python 3 (https://github.com/cloudpipe/cloudpickle/pull/32)

commit 6d5e5cf412e11d4b8b30f7d46c15405edcb4cb05
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-24T19:58:32Z

    Merge branch 'master' into holden-rgbkrk-cloudpickle-upgrades

commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-24T20:01:46Z

    Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110

commit f8ff2da5c093bf20a65df1869ecd3def3fbac2c5
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T22:02:16Z

    Test fix

commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T22:02:24Z

    Revert "Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110"
    
    This reverts commit 1cfd38f73da328bcf58ab32228ace4ff59bc26d2.

commit 195cd21ece2036df88a95d2fdc7dfd29c5681efa
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T22:07:13Z

    Fixed named tuple issue

commit 74880eabf218b4e9f29a25583442ee8d3b6bb0a6
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T22:09:30Z

    Try and move the fix for namedtuple and re-enable the rest of the useful cloudpickle fixes.
    
    Revert "Revert "Copy over support work weakset, dynamic classess, and remove __transient__ support from PR#110""
    
    This reverts commit cff6bfb83d04ee29d660a300b08f6a99dd636cf0.

commit 9a0f9b4b9958c5fed6d2c84d725cb03a7be7d41e
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T22:12:24Z

    Re-enable our custom exception message

commit 09cf41eb3e75e92cc9914e675fb2cb2f99290d38
Author: Holden Karau <ho...@us.ibm.com>
Date:   2017-07-25T23:49:57Z

    Save and restore the module info functions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80828/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r130774005
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -220,12 +322,7 @@ def save_function(self, obj, name=None):
     
             if name is None:
                 name = obj.__name__
    -        try:
    -            # whichmodule() could fail, see
    -            # https://bitbucket.org/gutworth/six/issues/63/importing-six-breaks-pickling
    -            modname = pickle.whichmodule(obj, name)
    -        except Exception:
    -            modname = None
    +        modname = pickle.whichmodule(obj, name)
    --- End diff --
    
    It's a good question, the underlying issue was marked resolved in 2014 and from looking at the commit it seems like it should actually be resolved. That being said its true some people might be on a system installed with an old verison of six so perhaps we should keep this workaround.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    huzzah! I'm in the middle of getting some code working for a talk tomorrow so I'll circle back on this on Friday. If @davies has any opinions though it would be great to hear them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #79947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79947/testReport)** for PR 18734 at commit [`09cf41e`](https://github.com/apache/spark/commit/09cf41eb3e75e92cc9914e675fb2cb2f99290d38).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80826/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Sounds good @HyukjinKwon :) I think we should consider if we want to upstream the named tuple workaround as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r130763298
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -220,12 +322,7 @@ def save_function(self, obj, name=None):
     
             if name is None:
                 name = obj.__name__
    -        try:
    -            # whichmodule() could fail, see
    -            # https://bitbucket.org/gutworth/six/issues/63/importing-six-breaks-pickling
    -            modname = pickle.whichmodule(obj, name)
    -        except Exception:
    -            modname = None
    +        modname = pickle.whichmodule(obj, name)
    --- End diff --
    
    @holdenk, would it be safe to remove? It looks a custom fix we did to deal with https://issues.apache.org/jira/browse/SPARK-16077 and sounds we should keep this as https://bitbucket.org/gutworth/six/issues/63/importing-six-breaks-pickling describes Python 2.7.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    I am merging this because:
    
    cloudpickle looks initially ported from https://github.com/cloudpipe/cloudpickle/commit/7aebb7ed42258a9392c2ada9b4bb390d566630cc and https://github.com/cloudpipe/cloudpickle/commit/c4f885116126b2ac49deae7a31d4941d006f319f (-> https://github.com/apache/spark/commit/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8), where I see both are identical.
    
    After https://github.com/apache/spark/commit/04e44b37cc04f62fbf9e08c7076349e0a4d12ea8, we have diff - https://github.com/apache/spark/commit/e044705b4402f86d0557ecd146f3565388c7eeb4, https://github.com/apache/spark/commit/55204181004c105c7a3e8c31a099b37e48bfd953, https://github.com/apache/spark/commit/ee913e6e2d58dfac20f3f06ff306081bd0e48066, https://github.com/apache/spark/commit/d48935400ca47275f677b527c636976af09332c8, https://github.com/apache/spark/commit/dbfc7aa4d0d5457bc92e1e66d065c6088d476843, https://github.com/apache/spark/commit/20e6280626fe243b170a2e7c5e018c67f3dac1db and https://github.com/apache/spark/commit/6297697f975960a3006c4e58b4964d9ac40eeaf5
    
    **[SPARK-9116] [SQL] [PYSPARK] support Python only UDT in __main__**, https://github.com/apache/spark/commit/e044705b4402f86d0557ecd146f3565388c7eeb4: I think this part is only what we are worried of. It looks supporting `classmethod`, `staticmethod` and `property`. We have a test:
    
    https://github.com/apache/spark/blob/96608310501a43fa4ab9f2697f202d655dba98c5/python/pyspark/sql/tests.py#L141-L173
    
    https://github.com/apache/spark/blob/96608310501a43fa4ab9f2697f202d655dba98c5/python/pyspark/sql/tests.py#L898-L927
    
    **[SPARK-10542] [PYSPARK] fix serialize namedtuple**, https://github.com/apache/spark/commit/55204181004c105c7a3e8c31a099b37e48bfd953: We keep the changes:
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L1090-L1095
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L433-L436
    
    and the related test pass:
    
    https://github.com/apache/spark/blob/77cc0d67d5a7ea526f8efd37b2590923953cb8e0/python/pyspark/tests.py#L211-L219
    
    **[SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads**, https://github.com/apache/spark/commit/ee913e6e2d58dfac20f3f06ff306081bd0e48066: We keep this change:
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L528
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L1022-L1029
    
    and the related test pass:
    
    https://github.com/apache/spark/blob/77cc0d67d5a7ea526f8efd37b2590923953cb8e0/python/pyspark/tests.py#L233-L237
    
    We should probably port this one to `cloudpipe/cloudpickle`.
    
    
    **[SPARK-16077] [PYSPARK] catch the exception from pickle.whichmodule()**, https://github.com/apache/spark/commit/d48935400ca47275f677b527c636976af09332c8: We keep this change:
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L325-L330
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L620-L625
    
    This patch even should be safer as I and @rgbkrk verified this with some tests:
    
    https://github.com/cloudpipe/cloudpickle/pull/112
    
    
    **[SPARK-17472] [PYSPARK] Better error message for serialization failures of large objects in Python**, https://github.com/apache/spark/commit/dbfc7aa4d0d5457bc92e1e66d065c6088d476843: We keep this change:
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L240-L249
    
    Probably, we should port this change into `cloudpipe/cloudpickle`.
    
    **[SPARK-19019] [PYTHON] Fix hijacked `collections.namedtuple` and port cloudpickle changes for PySpark to work with Python 3.6.0**, https://github.com/apache/spark/commit/20e6280626fe243b170a2e7c5e018c67f3dac1db
    
    This change was ported from `cloudpipe/cloudpickle`. I tested our PySpark tests pass with Python 3.6.0 in my local manually - https://github.com/apache/spark/pull/18734#issuecomment-319558550
    
    **[SPARK-19505][PYTHON] AttributeError on Exception.message in Python3**, https://github.com/apache/spark/commit/6297697f975960a3006c4e58b4964d9ac40eeaf5: We keep this change:
    
    https://github.com/holdenk/spark/blob/f986c2591a9a0b6962862c5cdfc33a7d65be7eda/python/pyspark/cloudpickle.py#L240-L249
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18734


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r133609212
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -241,11 +338,32 @@ def save_function(self, obj, name=None):
                 if getattr(themodule, name, None) is obj:
                     return self.save_global(obj, name)
     
    +        # a builtin_function_or_method which comes in as an attribute of some
    +        # object (e.g., object.__new__, itertools.chain.from_iterable) will end
    +        # up with modname "__main__" and so end up here. But these functions
    +        # have no __code__ attribute in CPython, so the handling for
    +        # user-defined functions below will fail.
    +        # So we pickle them here using save_reduce; have to do it differently
    +        # for different python versions.
    +        if not hasattr(obj, '__code__'):
    +            if PY3:
    +                if sys.version_info < (3, 4):
    +                    raise pickle.PicklingError("Can't pickle %r" % obj)
    +                else:
    +                    rv = obj.__reduce_ex__(self.proto)
    +            else:
    +                if hasattr(obj, '__self__'):
    +                    rv = (getattr, (obj.__self__, name))
    +                else:
    +                    raise pickle.PicklingError("Can't pickle %r" % obj)
    +            return Pickler.save_reduce(self, obj=obj, *rv)
    +
             # if func is lambda, def'ed at prompt, is in main, or is nested, then
             # we'll pickle the actual function object rather than simply saving a
             # reference (as is done in default pickler), via save_function_tuple.
    -        if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is None:
    -            #print("save global", islambda(obj), obj.__code__.co_filename, modname, themodule)
    +        if (islambda(obj)
    --- End diff --
    
    Just as a side note, it looks this PR includes https://github.com/cloudpipe/cloudpickle/pull/51 too (SPARK-21753).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Let's go with this. I am quite sure it is worth for a go. We can almost sync it and can deduplucate review costs and etc.. Will go merging this one in few days with another double check if there is no objection here and it is not mergrd by anyone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80828/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by rgbkrk <gi...@git.apache.org>.
Github user rgbkrk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Yeah this is looking great, it also brings in the latest from cloudpickle that wasn't in my original patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r132334455
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -397,42 +625,7 @@ def save_global(self, obj, name=None, pack=struct.pack):
     
             typ = type(obj)
             if typ is not obj and isinstance(obj, (type, types.ClassType)):
    -            d = dict(obj.__dict__)  # copy dict proxy to a dict
    -            if not isinstance(d.get('__dict__', None), property):
    -                # don't extract dict that are properties
    -                d.pop('__dict__', None)
    -            d.pop('__weakref__', None)
    -
    -            # hack as __new__ is stored differently in the __dict__
    -            new_override = d.get('__new__', None)
    -            if new_override:
    -                d['__new__'] = obj.__new__
    -
    -            # workaround for namedtuple (hijacked by PySpark)
    -            if getattr(obj, '_is_namedtuple_', False):
    -                self.save_reduce(_load_namedtuple, (obj.__name__, obj._fields))
    -                return
    -
    -            self.save(_load_class)
    -            self.save_reduce(typ, (obj.__name__, obj.__bases__, {"__doc__": obj.__doc__}), obj=obj)
    -            d.pop('__doc__', None)
    -            # handle property and staticmethod
    -            dd = {}
    -            for k, v in d.items():
    --- End diff --
    
    Gentle re-ping to @davies - do you have an opinion on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Oh, it was me cc'ing you mistakenly. Sorry for the noise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    BTW, I also checked it passes tests with Python 3.6 in my local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by rgbkrk <gi...@git.apache.org>.
Github user rgbkrk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Workarounds welcomed on cloudpickle. πŸ˜„ 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    But for now yes I think we are in a good position. If no one objects of course :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80842/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r130773575
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -397,42 +625,7 @@ def save_global(self, obj, name=None, pack=struct.pack):
     
             typ = type(obj)
             if typ is not obj and isinstance(obj, (type, types.ClassType)):
    -            d = dict(obj.__dict__)  # copy dict proxy to a dict
    -            if not isinstance(d.get('__dict__', None), property):
    -                # don't extract dict that are properties
    -                d.pop('__dict__', None)
    -            d.pop('__weakref__', None)
    -
    -            # hack as __new__ is stored differently in the __dict__
    -            new_override = d.get('__new__', None)
    -            if new_override:
    -                d['__new__'] = obj.__new__
    -
    -            # workaround for namedtuple (hijacked by PySpark)
    -            if getattr(obj, '_is_namedtuple_', False):
    -                self.save_reduce(_load_namedtuple, (obj.__name__, obj._fields))
    -                return
    -
    -            self.save(_load_class)
    -            self.save_reduce(typ, (obj.__name__, obj.__bases__, {"__doc__": obj.__doc__}), obj=obj)
    -            d.pop('__doc__', None)
    -            # handle property and staticmethod
    -            dd = {}
    -            for k, v in d.items():
    --- End diff --
    
    Lets double check this part with @davies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80950/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80950/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80842/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by rgbkrk <gi...@git.apache.org>.
Github user rgbkrk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Just a note that we just shipped the fixes from @HyukjinKwon within cloudpickle (as v0.4.0), so we're at least roughly in sync now. πŸ˜„ 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    LGTM
    
    I am going to credit this to @rgbkrk per (http://spark.apache.org/contributing.html)
    
    > In case several people contributed, prefer to assign to the more β€˜junior’, non-committer contributor
    
    I just double checked if the tests passes with Python 3.6.0, and if I could run pi example with pypy manually (SPARK-21753), with the current status.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    I guess this should be fine in general if it simply ports the changes in cloudpickle. Will double check next week (to me, it takes a while to double check).
    
    @rgbkrk, would you mind double checking this one and see if it looks good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80826/testReport)** for PR 18734 at commit [`7ad278a`](https://github.com/apache/spark/commit/7ad278a1f20fc7b66e7d273a98c4bdc73dd6833d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r130766049
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -397,42 +625,7 @@ def save_global(self, obj, name=None, pack=struct.pack):
     
             typ = type(obj)
             if typ is not obj and isinstance(obj, (type, types.ClassType)):
    -            d = dict(obj.__dict__)  # copy dict proxy to a dict
    -            if not isinstance(d.get('__dict__', None), property):
    -                # don't extract dict that are properties
    -                d.pop('__dict__', None)
    -            d.pop('__weakref__', None)
    -
    -            # hack as __new__ is stored differently in the __dict__
    -            new_override = d.get('__new__', None)
    -            if new_override:
    -                d['__new__'] = obj.__new__
    -
    -            # workaround for namedtuple (hijacked by PySpark)
    -            if getattr(obj, '_is_namedtuple_', False):
    -                self.save_reduce(_load_namedtuple, (obj.__name__, obj._fields))
    -                return
    -
    -            self.save(_load_class)
    -            self.save_reduce(typ, (obj.__name__, obj.__bases__, {"__doc__": obj.__doc__}), obj=obj)
    -            d.pop('__doc__', None)
    -            # handle property and staticmethod
    -            dd = {}
    -            for k, v in d.items():
    --- End diff --
    
    I assume we are fine per the tests added in https://github.com/apache/spark/commit/e044705b4402f86d0557ecd146f3565388c7eeb4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #79947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79947/testReport)** for PR 18734 at commit [`09cf41e`](https://github.com/apache/spark/commit/09cf41eb3e75e92cc9914e675fb2cb2f99290d38).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    If we can reach agreement on this I'll see about trying to get our local workarounds upstreamed into cloudpickle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [WIP][SPARK-21070][PYSPARK] Attempt to update cloudpickl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79947/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    cc @ushine who I believe is also appropriate to review this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80950/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80826/testReport)** for PR 18734 at commit [`7ad278a`](https://github.com/apache/spark/commit/7ad278a1f20fc7b66e7d273a98c4bdc73dd6833d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    cc @rgbkrk since this is based on your original PR. + @HyukjinKwon 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by rgbkrk <gi...@git.apache.org>.
Github user rgbkrk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    You are my hero, thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18734#discussion_r130765110
  
    --- Diff: python/pyspark/cloudpickle.py ---
    @@ -623,72 +816,82 @@ def save_file(self, obj):
                 return self.save_reduce(getattr, (sys,'stderr'), obj=obj)
             if obj is sys.stdin:
                 raise pickle.PicklingError("Cannot pickle standard input")
    -        if  hasattr(obj, 'isatty') and obj.isatty():
    +        if obj.closed:
    +            raise pickle.PicklingError("Cannot pickle closed files")
    +        if hasattr(obj, 'isatty') and obj.isatty():
                 raise pickle.PicklingError("Cannot pickle files that map to tty objects")
    -        if 'r' not in obj.mode:
    -            raise pickle.PicklingError("Cannot pickle files that are not opened for reading")
    +        if 'r' not in obj.mode and '+' not in obj.mode:
    +            raise pickle.PicklingError("Cannot pickle files that are not opened for reading: %s" % obj.mode)
    +
             name = obj.name
    -        try:
    -            fsize = os.stat(name).st_size
    -        except OSError:
    -            raise pickle.PicklingError("Cannot pickle file %s as it cannot be stat" % name)
     
    -        if obj.closed:
    -            #create an empty closed string io
    -            retval = pystringIO.StringIO("")
    -            retval.close()
    -        elif not fsize: #empty file
    -            retval = pystringIO.StringIO("")
    -            try:
    -                tmpfile = file(name)
    -                tst = tmpfile.read(1)
    -            except IOError:
    -                raise pickle.PicklingError("Cannot pickle file %s as it cannot be read" % name)
    -            tmpfile.close()
    -            if tst != '':
    -                raise pickle.PicklingError("Cannot pickle file %s as it does not appear to map to a physical, real file" % name)
    -        else:
    -            try:
    -                tmpfile = file(name)
    -                contents = tmpfile.read()
    -                tmpfile.close()
    -            except IOError:
    -                raise pickle.PicklingError("Cannot pickle file %s as it cannot be read" % name)
    -            retval = pystringIO.StringIO(contents)
    +        retval = pystringIO.StringIO()
    +
    +        try:
    +            # Read the whole file
                 curloc = obj.tell()
    -            retval.seek(curloc)
    +            obj.seek(0)
    +            contents = obj.read()
    +            obj.seek(curloc)
    +        except IOError:
    +            raise pickle.PicklingError("Cannot pickle file %s as it cannot be read" % name)
    +        retval.write(contents)
    +        retval.seek(curloc)
     
             retval.name = name
             self.save(retval)
             self.memoize(obj)
     
    +    def save_ellipsis(self, obj):
    +        self.save_reduce(_gen_ellipsis, ())
    +
    +    def save_not_implemented(self, obj):
    +        self.save_reduce(_gen_not_implemented, ())
    +
         if PY3:
             dispatch[io.TextIOWrapper] = save_file
         else:
             dispatch[file] = save_file
     
    -    """Special functions for Add-on libraries"""
    +    dispatch[type(Ellipsis)] = save_ellipsis
    +    dispatch[type(NotImplemented)] = save_not_implemented
     
    -    def inject_numpy(self):
    --- End diff --
    
    I assume removing this is fine per https://github.com/cloudpipe/cloudpickle/commit/1e91fa7c0f9b1e77604d83b3ba9aecde8603ece1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Yea, it looks so. Named tuple one reminds me of the workaround we have for named tuple to make picklable - https://github.com/apache/spark/blob/d03aebbe6508ba441dc87f9546f27aeb27553d77/python/pyspark/serializers.py#L395-L446
    
    Maybe, we could take a look and see if we could get rid of this or port this. Anyway, let me take a final look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by ushine <gi...@git.apache.org>.
Github user ushine commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Can you guys please exclude me from the loop?
    
    -
    Best,
    Daniel Biesiada
    
    > On 22 Aug 2017, at 04:22, asfgit <no...@github.com> wrote:
    > 
    > Closed #18734 <https://github.com/apache/spark/pull/18734> via 751f513 <https://github.com/apache/spark/commit/751f513367ae776c6d6815e1ce138078924872eb>.
    > 
    > β€”
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub <https://github.com/apache/spark/pull/18734#event-1215117326>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC25HQ7SYMpvvHn2Meyysr5Id26oxyzgks5sajtMgaJpZM4OjRjO>.
    > 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80828/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    Thank you for cc-ing me.
    I looked over this and it looks good to me except for a comment https://github.com/apache/spark/pull/18734#discussion_r130763298.
    I was not exactly sure if we could remove these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    **[Test build #80842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80842/testReport)** for PR 18734 at commit [`f986c25`](https://github.com/apache/spark/commit/f986c2591a9a0b6962862c5cdfc33a7d65be7eda).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18734: [SPARK-21070][PYSPARK] Attempt to update cloudpickle aga...

Posted by rgbkrk <gi...@git.apache.org>.
Github user rgbkrk commented on the issue:

    https://github.com/apache/spark/pull/18734
  
    > I looked over this and it looks good to me except for a comment #18734 (comment). I was not exactly sure if we could remove these.
    
    cloudpickle now has this workaround, so it can stay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org