You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "cdkrot (via GitHub)" <gi...@apache.org> on 2023/09/15 15:12:57 UTC

[GitHub] [spark] cdkrot opened a new pull request, #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

cdkrot opened a new pull request, #42949:
URL: https://github.com/apache/spark/pull/42949

   ### What changes were proposed in this pull request?
   
   Add error logging into `addArtifact`  (see example in "How this is tested). The logging code is moved into separate file to avoid circular dependency.
   
   ### Why are the changes needed?
   
   Currently, in case `addArtifact` is executed with the file which doesn't exist, the user gets cryptic error
   
   ```grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
           status = StatusCode.UNKNOWN
           details = "Exception iterating requests!"
           debug_error_string = "None"
   >
   ```
   
   Which is impossible to debug without deep digging into the subject.
   
   This happens because addArtifact is implemented as client-side streaming and the actual error happens during grpc consuming iterator generating requests. Unfortunately grpc doesn't print any debug information for user to understand the problem.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Additional logging which is opt-in same way as before with `SPARK_CONNECT_LOG_LEVEL` environment variable.
   
   ### How was this patch tested?
   
   ```
   >>> s.addArtifact("XYZ", file=True)
   2023-09-15 17:06:40,078 11789 ERROR _create_requests Failed to execute addArtifact: [Errno 2] No such file or directory: '/Users/alice.sayutina/apache_spark/python/XYZ'
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/Users/alice.sayutina/apache_spark/python/pyspark/sql/connect/session.py", line 743, in addArtifacts
       self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, file=file)
   
   [....]
   
     File "/Users/alice.sayutina/oss-venv/lib/python3.11/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking
       raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
           status = StatusCode.UNKNOWN
           details = "Exception iterating requests!"
           debug_error_string = "None"
   >
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329756902


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   The problem is not that we don't raise error below, but that it will be intercepted by `grpc`, and it doesn't log or display what was thrown (I would argue that it definitely should, but it doesn't).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query
URL: https://github.com/apache/spark/pull/42949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1328174821


##########
python/pyspark/sql/connect/client/logging.py:
##########
@@ -0,0 +1,43 @@
+import logging
+import os
+from typing import Optional
+
+__all__ = [
+    "logger",

Review Comment:
   ```suggestion
   ```
   
   I suspect `logger` isn't an API?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331929198


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   >> I just think we shouldn't leave users this error message, when they don't enable the spark connect logger:
   
   We can wrap the outside exception in some SparkException saying that AddArtifacts failed. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   > Could you be more specific on what Exception can be thrown here? From your PR description, it looks like a GPRC error.
   
   Doesn't matter, any exception thrown here would be suppressed by grpc and only generic exception "Fail to iterate response" will be visible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1728565241

   Mind running `dev/reformat-python` one more last time? Seems like there's a diff between old and new black versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1729309215

   Done, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329756902


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   The problem is not that we don't raise error below, but that it will be intercepted by `grpc`, and it doesn't log what was thrown (I would argue that it definitely should, but.).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329755276


##########
python/pyspark/sql/connect/client/logging.py:
##########
@@ -0,0 +1,42 @@
+import logging

Review Comment:
   Thanks :). Forgot about that. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   > Could you be more specific on what Exception can be thrown here? From your PR description, it looks like a GPRC error.
   
   Doesn't matter, any exception thrown here would be suppressed by grpc and only generic exception "Exception iterating requests" will be visible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1328421040


##########
python/pyspark/sql/connect/client/logging.py:
##########
@@ -0,0 +1,43 @@
+import logging
+import os
+from typing import Optional
+
+__all__ = [
+    "logger",

Review Comment:
   logger isn't an API, yeah. I didn't import it in __init__, but probably best to drop it here too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1330447085


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   In core.py, there is a method to handle errors from the server: `def _handle_error(error)`. It's able to handle various GRPC errors and also custom errors. Can we have something similar for add artifacts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331894962


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   Could you be more specific on what `Exception` can be thrown here? From your PR description, it looks like a GPRC error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331901403


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   > What's the behavior after your change when the SPARK_CONNECT logger is enabled?
   described in pr
   



##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   > What's the behavior after your change when the SPARK_CONNECT logger is enabled?
   
   described in pr
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1730753709

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329547039


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   I think we should show this error message regardless of the `SPARK_CONNECT_LOG_LEVEL` setting to make it more debuggable. Can we raise a more user-friendly error below?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1721446450

   cc @HyukjinKwon, @nija-at 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1726777160

   Ah, the master branch and your branch should be synced to the latest master brnach. Black was latenly upgraded.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] nija-at commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "nija-at (via GitHub)" <gi...@apache.org>.
nija-at commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1328319209


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifact request: {e}")

Review Comment:
   ```suggestion
               logger.error(f"Failed to submit addArtifacts request: {e}")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1328174896


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -271,6 +276,7 @@ def add_artifacts(self, *path: str, pyfile: bool, archive: bool, file: bool) ->
         requests: Iterator[proto.AddArtifactsRequest] = self._create_requests(
             *path, pyfile=pyfile, archive=archive, file=file
         )
+

Review Comment:
   ```suggestion
   ```
   
   Since we're here, I would remove the unrelated change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1727917945

   Updated fork's master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331897585


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   What's the behavior after your change when the SPARK_CONNECT logger is enabled?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331901403


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   > What's the behavior after your change when the SPARK_CONNECT logger is enabled?
   
   please see description of pr
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331899412


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   I just think we shouldn't leave users this error message, when they don't enable the spark connect logger:
   ```
           status = StatusCode.UNKNOWN
           details = "Exception iterating requests!"
           debug_error_string = "None"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331901152


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   In my example I'm intercepting FileNotFound error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1331785918


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   I don't think we can do something like that here. We are not having server-side error response needed to process. We have purely client-side error of exception being thrown while grpc consumes iterator of requests. It's not possible to extract any information on outside of grpc being called since it suppresses error entirely.
   
   I think the only option to make proper exception classes is to preload all the artifacts into memory and construct corresponding requests before streaming them into grpc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on PR #42949:
URL: https://github.com/apache/spark/pull/42949#issuecomment-1725417651

   From PR script:
   
   ```Run PYTHON_EXECUTABLE=python3.9 ./dev/lint-python
   starting python compilation test...
   python compilation succeeded.
   
   starting black test...
   black checks failed:
   Oh no! 💥 💔 💥 The required version `23.9.1` does not match the running version `22.6.0`!
   Please run 'dev/reformat-python' script.
   ```
   
   Should I do something about it or is it a known problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329390408


##########
python/pyspark/sql/connect/client/logging.py:
##########
@@ -0,0 +1,42 @@
+import logging

Review Comment:
   Seems like linter complains that there's no license header :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

Posted by "cdkrot (via GitHub)" <gi...@apache.org>.
cdkrot commented on code in PR #42949:
URL: https://github.com/apache/spark/pull/42949#discussion_r1329756902


##########
python/pyspark/sql/connect/client/artifact.py:
##########
@@ -243,11 +244,15 @@ def _create_requests(
         self, *path: str, pyfile: bool, archive: bool, file: bool
     ) -> Iterator[proto.AddArtifactsRequest]:
         """Separated for the testing purpose."""
-        return self._add_artifacts(
-            chain(
-                *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+        try:
+            yield from self._add_artifacts(
+                chain(
+                    *(self._parse_artifacts(p, pyfile=pyfile, archive=archive, file=file) for p in path)
+                )
             )
-        )
+        except Exception as e:
+            logger.error(f"Failed to submit addArtifacts request: {e}")

Review Comment:
   The problem is not that we don't raise error below, but that it will be intercepted by `grpc`, and it doesn't log what was thrown (I would argue that it definitely should, but it doesn't).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org