You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/18 03:33:51 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

HyukjinKwon opened a new pull request, #38700:
URL: https://github.com/apache/spark/pull/38700

   ### What changes were proposed in this pull request?
   
   This PR is a followup of https://github.com/apache/spark/pull/34688 that adds a switch to turn on and off the namedtuple hack.
   
   ### Why are the changes needed?
   
   There are still behaviour differences between regular pickle and Cloudpickle e.g., bug fixes from the upstream. It's safer to have a switch to turn on and off for the time being.
   
   ### Does this PR introduce _any_ user-facing change?
   
   This remains as an internal environment so ideally no. In fact the main change itself was the internal change too.
   
   ### How was this patch tested?
   
   Manually tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack
URL: https://github.com/apache/spark/pull/38700


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sadikovi commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

Posted by GitBox <gi...@apache.org>.
sadikovi commented on code in PR #38700:
URL: https://github.com/apache/spark/pull/38700#discussion_r1025989704


##########
python/pyspark/serializers.py:
##########
@@ -357,7 +358,7 @@ def dumps(self, obj):
         return obj
 
 
-if sys.version_info < (3, 8):
+if sys.version_info < (3, 8) or os.environ.get("PYSPARK_MONKEYPATCH_NAMEDTUPLE") == "1":

Review Comment:
   nit: I was thinking about a name like this one: `PYSPARK_ENABLE_NAMEDTUPLE_PATCH` but I am open to alternatives.



##########
python/pyspark/serializers.py:
##########
@@ -54,6 +54,7 @@
 """
 
 import sys
+import os

Review Comment:
   nit: should os come before sys?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38700:
URL: https://github.com/apache/spark/pull/38700#discussion_r1026007386


##########
python/pyspark/serializers.py:
##########
@@ -54,6 +54,7 @@
 """
 
 import sys
+import os

Review Comment:
   eh, it's actually fine in Python import (per PEP 8)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #38700:
URL: https://github.com/apache/spark/pull/38700#discussion_r1026007445


##########
python/pyspark/serializers.py:
##########
@@ -357,7 +358,7 @@ def dumps(self, obj):
         return obj
 
 
-if sys.version_info < (3, 8):
+if sys.version_info < (3, 8) or os.environ.get("PYSPARK_MONKEYPATCH_NAMEDTUPLE") == "1":

Review Comment:
   ```suggestion
   if sys.version_info < (3, 8) or os.environ.get("PYSPARK_ENABLE_NAMEDTUPLE_PATCH") == "1":
   ```



##########
python/pyspark/serializers.py:
##########
@@ -471,7 +472,7 @@ def loads(self, obj, encoding="bytes"):
         return cloudpickle.loads(obj, encoding=encoding)
 
 
-if sys.version_info < (3, 8):
+if sys.version_info < (3, 8) or os.environ.get("PYSPARK_MONKEYPATCH_NAMEDTUPLE") == "1":

Review Comment:
   ```suggestion
   if sys.version_info < (3, 8) or os.environ.get("PYSPARK_ENABLE_NAMEDTUPLE_PATCH") == "1":
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #38700: [SPARK-41189][PYTHON] Add an environment to switch on and off namedtuple hack

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #38700:
URL: https://github.com/apache/spark/pull/38700#issuecomment-1319719556

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org