You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2022/11/18 08:59:59 UTC

[spark] branch master updated: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new bd2beae5843 [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack
bd2beae5843 is described below

commit bd2beae58430f4537057c2f8b094e0c9f4ad67af
Author: Hyukjin Kwon <gu...@apache.org>
AuthorDate: Fri Nov 18 17:59:40 2022 +0900

    [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack
    
    ### What changes were proposed in this pull request?
    
    This PR is a followup of https://github.com/apache/spark/pull/34688 that adds a switch to turn on and off the namedtuple hack.
    
    ### Why are the changes needed?
    
    There are still behaviour differences between regular pickle and Cloudpickle e.g., bug fixes from the upstream. It's safer to have a switch to turn on and off for the time being.
    
    ### Does this PR introduce _any_ user-facing change?
    
    This remains as an internal environment so ideally no. In fact the main change itself was the internal change too.
    
    ### How was this patch tested?
    
    Manually tested.
    
    Closes #38700 from HyukjinKwon/SPARK-41189.
    
    Lead-authored-by: Hyukjin Kwon <gu...@apache.org>
    Co-authored-by: Hyukjin Kwon <gu...@gmail.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/serializers.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/serializers.py b/python/pyspark/serializers.py
index 8c5a941f376..ac587c07ecf 100644
--- a/python/pyspark/serializers.py
+++ b/python/pyspark/serializers.py
@@ -54,6 +54,7 @@ which contains two batches of two objects:
 """
 
 import sys
+import os
 from itertools import chain, product
 import marshal
 import struct
@@ -357,14 +358,14 @@ class NoOpSerializer(FramedSerializer):
         return obj
 
 
-if sys.version_info < (3, 8):
+if sys.version_info < (3, 8) or os.environ.get("PYSPARK_ENABLE_NAMEDTUPLE_PATCH") == "1":
     # Hack namedtuple, make it picklable.
     # For Python 3.8+, we use CPickle-based cloudpickle.
     # For Python 3.7 and below, we use legacy build-in CPickle which
     # requires namedtuple hack.
     # The whole hack here should be removed once we drop Python 3.7.
 
-    __cls = {}
+    __cls = {}  # type: ignore[var-annotated]
 
     def _restore(name, fields, value):
         """Restore an object of namedtuple"""
@@ -471,10 +472,10 @@ class CloudPickleSerializer(FramedSerializer):
         return cloudpickle.loads(obj, encoding=encoding)
 
 
-if sys.version_info < (3, 8):
+if sys.version_info < (3, 8) or os.environ.get("PYSPARK_ENABLE_NAMEDTUPLE_PATCH") == "1":
     CPickleSerializer = PickleSerializer
 else:
-    CPickleSerializer = CloudPickleSerializer
+    CPickleSerializer = CloudPickleSerializer  # type: ignore[misc, assignment]
 
 
 class MarshalSerializer(FramedSerializer):


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org