You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2023/03/30 08:00:53 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

grundprinzip opened a new pull request, #40603:
URL: https://github.com/apache/spark/pull/40603

   ### What changes were proposed in this pull request?
   Instead of just showing the Scala callsite show the abbreviate version of the proto message in the Spark UI.
   
   ### Why are the changes needed?
   Debugging
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Manual
   
   Before:
   
   ![image](https://user-images.githubusercontent.com/3421/228769858-51fbc5de-c84c-4f5e-86de-bd77c45d2847.png)
   
   After:
   
   ![image](https://user-images.githubusercontent.com/3421/228769955-e9db4737-6d29-4d51-9941-0cb650b73870.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #40603:
URL: https://github.com/apache/spark/pull/40603#issuecomment-1498771199

   Another question: will the raw data in `LocalRelation` also shown in Spark UI?   It might be sensitive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #40603:
URL: https://github.com/apache/spark/pull/40603#discussion_r1159502572


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##########
@@ -49,6 +50,16 @@ class SparkConnectStreamHandler(responseObserver: StreamObserver[ExecutePlanResp
         .getOrCreateIsolatedSession(v.getUserContext.getUserId, v.getSessionId)
         .session
     session.withActive {
+
+      // Add debug information to the query execution so that the jobs are traceable.
+      val debugString = v.toString

Review Comment:
   I think we need a filter for objects that we don't print. My guess is that you're submitting a very large Python UDF that requires lots of heap space in the string. Do you have more details there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40603:
URL: https://github.com/apache/spark/pull/40603#issuecomment-1502562037

   Let me make a PR to redact it for now at least.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #40603:
URL: https://github.com/apache/spark/pull/40603#discussion_r1159479080


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##########
@@ -49,6 +50,16 @@ class SparkConnectStreamHandler(responseObserver: StreamObserver[ExecutePlanResp
         .getOrCreateIsolatedSession(v.getUserContext.getUserId, v.getSessionId)
         .session
     session.withActive {
+
+      // Add debug information to the query execution so that the jobs are traceable.
+      val debugString = v.toString

Review Comment:
   `v.toString` is problematic in some cases, see https://github.com/apache/spark/pull/40607/files#r1159434752
   
   I think we have to manually implement it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #40603:
URL: https://github.com/apache/spark/pull/40603#discussion_r1159525799


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##########
@@ -49,6 +50,16 @@ class SparkConnectStreamHandler(responseObserver: StreamObserver[ExecutePlanResp
         .getOrCreateIsolatedSession(v.getUserContext.getUserId, v.getSessionId)
         .session
     session.withActive {
+
+      // Add debug information to the query execution so that the jobs are traceable.
+      val debugString = v.toString

Review Comment:
   > My guess is that you're submitting a very large Python UDF that requires lots of heap space in the string
   
   [this one](https://github.com/apache/spark/blob/e81c2c895a66f00888198f615a1f11b410bb7475/python/pyspark/ml/torch/tests/test_distributor.py#L57-L122) , it will be converted to a Pandas UDF, should be very large
   
   I think it is also related to java version, since it throw OOM in Java8 but works well in Java11.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #40603:
URL: https://github.com/apache/spark/pull/40603#issuecomment-1492973410

   Ping on this @HyukjinKwon or @hvanhovell?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] hvanhovell closed pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.
URL: https://github.com/apache/spark/pull/40603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #40603:
URL: https://github.com/apache/spark/pull/40603#issuecomment-1502561858

   Actually it would also have a security concern as it exposes the local data as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on PR #40603:
URL: https://github.com/apache/spark/pull/40603#issuecomment-1498775987

   > Another question: will the raw data in `LocalRelation` also shown in Spark UI? It might be sensitive
   
   This is similar to what we do for SQL queries we put the literal string into the description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org