You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/11/09 12:02:40 UTC

[PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

grundprinzip opened a new pull request, #43732:
URL: https://github.com/apache/spark/pull/43732

   ### What changes were proposed in this pull request?
   The Python client for Spark connect logs the text representation of the proto message. However, for deeply nested objects this can lead to a Python recursion error even before the maximum nested recursion limit of the GRPC message is reached.
   
   This patch fixes this issue by explicitly catching the recursion error during text conversion.
   
   
   ### Why are the changes needed?
   Stability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398701832


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
 
 
 class SparkConnectBasicTests(SparkConnectSQLTestCase):
+    def test_recursion_handling_for_plan_logging(self):
+        """SPARK-45852 - Test that we can handle recursion in plan logging."""
+        cdf = self.connect.range(1)
+        for x in range(400):
+            cdf = cdf.withColumn(f"col_{x}", CF.lit(x))
+
+        # Calling schema will trigger logging the message that will in turn trigger the message
+        # conversion into protobuf that will then trigger the recursion error.
+        self.assertIsNotNone(cdf.schema)

Review Comment:
   Thanks!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43732:
URL: https://github.com/apache/spark/pull/43732#issuecomment-1817679687

   I made a PR.
   - https://github.com/apache/spark/pull/43885


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43732:
URL: https://github.com/apache/spark/pull/43732#issuecomment-1805276070

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398284548


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
 
 
 class SparkConnectBasicTests(SparkConnectSQLTestCase):
+    def test_recursion_handling_for_plan_logging(self):

Review Comment:
   Hi, @grundprinzip and @HyukjinKwon .
   
   This test case seems to fail with Python 3.11. SPARK-45987 is filed for Python 3.11 failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43732: [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging
URL: https://github.com/apache/spark/pull/43732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45852][CONNECT][PYTHON] Gracefully deal with recursion error during logging [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43732:
URL: https://github.com/apache/spark/pull/43732#discussion_r1398284673


##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -159,6 +159,19 @@ def spark_connect_clean_up_test_data(cls):
 
 
 class SparkConnectBasicTests(SparkConnectSQLTestCase):
+    def test_recursion_handling_for_plan_logging(self):
+        """SPARK-45852 - Test that we can handle recursion in plan logging."""
+        cdf = self.connect.range(1)
+        for x in range(400):
+            cdf = cdf.withColumn(f"col_{x}", CF.lit(x))
+
+        # Calling schema will trigger logging the message that will in turn trigger the message
+        # conversion into protobuf that will then trigger the recursion error.
+        self.assertIsNotNone(cdf.schema)

Review Comment:
   The failure happens here.
   ```
   ERROR [3.874s]: test_recursion_handling_for_plan_logging (pyspark.sql.tests.connect.test_connect_basic.SparkConnectBasicTests.test_recursion_handling_for_plan_logging)
   SPARK-45852 - Test that we can handle recursion in plan logging.
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/__w/spark/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 171, in test_recursion_handling_for_plan_logging
       self.assertIsNotNone(cdf.schema)
                            ^^^^^^^^^^
     File "/__w/spark/spark/python/pyspark/sql/connect/dataframe.py", line 1735, in schema
       return self._session.client.schema(query)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 924, in schema
       schema = self._analyze(method="schema", plan=plan).schema
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1110, in _analyze
       self._handle_error(error)
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1499, in _handle_error
       self._handle_rpc_error(error)
     File "/__w/spark/spark/python/pyspark/sql/connect/client/core.py", line 1570, in _handle_rpc_error
       raise SparkConnectGrpcException(str(rpc_error)) from None
   pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_InactiveRpcError of RPC that terminated with:
   	status = StatusCode.INTERNAL
   	details = "Exception serializing request!"
   	debug_error_string = "None"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org