You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "yabola (via GitHub)" <gi...@apache.org> on 2024/01/14 05:58:32 UTC

[PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

yabola opened a new pull request, #44721:
URL: https://github.com/apache/spark/pull/44721

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1452986785


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_END_ENABLED =
+      ConfigBuilder("spark.broadcast.cleanAfterExecutionEnd.enabled")
+        .doc("Whether to enable clean broadcast data after sql execution. If enabled, " +

Review Comment:
   @HyukjinKwon Thank you for your advice. But this parameter needs to be used in `BroadcastManager#initialize` (core module), but the core module does not import the SQL module. The core module use the parameters of the SQL module. I don’t know if they are appropriate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1452986785


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_END_ENABLED =
+      ConfigBuilder("spark.broadcast.cleanAfterExecutionEnd.enabled")
+        .doc("Whether to enable clean broadcast data after sql execution. If enabled, " +

Review Comment:
   @HyukjinKwon Thank you for your advice. But this parameter needs to be used in `BroadcastManager#initialize` (CORE module), CORE module does not import the SQL module.  I don’t know if they are appropriate(core module use the parameters of the SQL module).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1451680779


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_ENABLED =
+      ConfigBuilder("spark.broadcast.clean_after_execution.enabled")

Review Comment:
   Thank you, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on PR #44721:
URL: https://github.com/apache/spark/pull/44721#issuecomment-1895280484

   @HyukjinKwon @MaxGekk I have added new UT.  And the failed UT is about `run document build` (I don't know why...)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1452986785


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_END_ENABLED =
+      ConfigBuilder("spark.broadcast.cleanAfterExecutionEnd.enabled")
+        .doc("Whether to enable clean broadcast data after sql execution. If enabled, " +

Review Comment:
   @HyukjinKwon Thank you for your advice. But this parameter needs to be used in `BroadcastManager#initialize` (core module), but the core module does not import the SQL module.  I don’t know if they are appropriate(core module use the parameters of the SQL module).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1451822330


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_END_ENABLED =
+      ConfigBuilder("spark.broadcast.cleanAfterExecutionEnd.enabled")
+        .doc("Whether to enable clean broadcast data after sql execution. If enabled, " +

Review Comment:
   nit:
   
   ```suggestion
           .doc("Whether to enable clean broadcast data after SQL execution. If enabled, " +
   ```
   
   BTW, SQL configurations should go to `SQLConf`. If this is a static conf, it should go to `StaticSQLConf`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44721:
URL: https://github.com/apache/spark/pull/44721#discussion_r1451679661


##########
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##########
@@ -2025,6 +2025,15 @@ package object config {
       .checkValue(v => v >= 0, "The threshold should be non-negative.")
       .createWithDefault(1L * 1024 * 1024)
 
+    private[spark] val CLEAN_BROADCAST_AFTER_EXECUTION_ENABLED =
+      ConfigBuilder("spark.broadcast.clean_after_execution.enabled")

Review Comment:
   Please, use the Camel case. Look at the other configs in the file. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola commented on PR #44721:
URL: https://github.com/apache/spark/pull/44721#issuecomment-1913878193

   I tested a scenario (a large number of broadcast joins, mainly iceberg expire snapshot & remove_orphan_files SQL), and found its effect to be very noticeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends [spark]

Posted by "yabola (via GitHub)" <gi...@apache.org>.
yabola closed pull request #44721: [SPARK-46710][SQL] Clean up the broadcast data generated when sql execution ends
URL: https://github.com/apache/spark/pull/44721


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org