You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/11/09 12:02:40 UTC
[PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
grundprinzip opened a new pull request, #43732:
URL: https://github.com/apache/spark/pull/43732
### What changes were proposed in this pull request?
The Python client for Spark connect logs the text representation of the proto message. However, for deeply nested objects this can lead to a Python recursion error even before the maximum nested recursion limit of the GRPC message is reached.
This patch fixes this issue by explicitly catching the recursion error during text conversion.
### Why are the changes needed?
Stability
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
UT
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398701832
##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
class SparkConnectBasicTests(SparkConnectSQLTestCase):
+ def test_recursion_handling_for_plan_logging(self):
+ """SPARK-45852 - Test that we can handle recursion in plan logging."""
+ cdf = self.connect.range(1)
+ for x in range(400):
+ cdf = cdf.withColumn(f"col_{x}", CF.lit(x))
+
+ # Calling schema will trigger logging the message that will in turn trigger the message
+ # conversion into protobuf that will then trigger the recursion error.
+ self.assertIsNotNone(cdf.schema)
Review Comment:
Thanks!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43732:
URL: https://github.com/apache/spark/pull/43732#issuecomment-1817679687
I made a PR.
- https://github.com/apache/spark/pull/43885
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43732:
URL: https://github.com/apache/spark/pull/43732#issuecomment-1805276070
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398284548
##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
class SparkConnectBasicTests(SparkConnectSQLTestCase):
+ def test_recursion_handling_for_plan_logging(self):
Review Comment:
Hi, @grundprinzip and @HyukjinKwon .
This test case seems to fail with Python 3.11. SPARK-45987 is filed for Python 3.11 failure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43732: [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging
URL: https://github.com/apache/spark/pull/43732
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398284673
##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
class SparkConnectBasicTests(SparkConnectSQLTestCase):
+ def test_recursion_handling_for_plan_logging(self):
+ """SPARK-45852 - Test that we can handle recursion in plan logging."""
+ cdf = self.connect.range(1)
+ for x in range(400):
+ cdf = cdf.withColumn(f"col_{x}", CF.lit(x))
+
+ # Calling schema will trigger logging the message that will in turn trigger the message
+ # conversion into protobuf that will then trigger the recursion error.
+ self.assertIsNotNone(cdf.schema)
Review Comment:
The failure happens here.
```
ERROR [3.874s]: test_recursion_handling_for_plan_logging (pyspark.sql.tests.connect.test_connect_basic.SparkConnectBasicTests.test_recursion_handling_for_plan_logging)
SPARK-45852 - Test that we can handle recursion in plan logging.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 171, in test_recursion_handling_for_plan_logging
self.assertIsNotNone(cdf.schema)
^^^^^^^^^^
File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1735, in schema
return self._session.client.schema(query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 924, in schema
schema = self._analyze(method="schema", plan=plan).schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1110, in _analyze
self._handle_error(error)
File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1499, in _handle_error
self._handle_rpc_error(error)
File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1570, in _handle_rpc_error
raise SparkConnectGrpcException(str(rpc_error)) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Exception serializing request!"
debug_error_string = "None"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org