You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2024/01/18 13:01:45 UTC

[PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

beliefer opened a new pull request, #44787:
URL: https://github.com/apache/spark/pull/44787

   ### What changes were proposed in this pull request?
   This PR propose to make the document of `spark.sql.adaptive.coalescePartitions.parallelismFirst` clearer.
   
   
   ### Why are the changes needed?
   The default value of `spark.sql.adaptive.coalescePartitions.parallelismFirst` is true, but the document contains the word `recommended to set this config to false and respect the configured target size`. It's very confused.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   The document is more clear.
   
   
   ### How was this patch tested?
   N/A
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1920513955

   (Re run the tests)
   Is the text OK @cloud-fan or are you suggesting a modification?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen closed pull request #44787: [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer
URL: https://github.com/apache/spark/pull/44787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44787:
URL: https://github.com/apache/spark/pull/44787#discussion_r1457505607


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +
-        "to false and respect the configured target size.")
+        "regression when enabling adaptive query execution. If you respect the configured " +
+        "target size, please set this config to false.")

Review Comment:
   Minor suggestion:
   
   ```suggestion
           "regressions when enabling adaptive query execution. To respect the configured " +
           "target size, please set this config to false.")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1920449390

   This is true by default because in benchmarks we run one query at a time and it can use all the resources of the entire cluster. So parallelism is more important, as if we pick 64mb as the target size and save some CPU slots, no other tasks will use these free slots.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1925548361

   @srowen @cloud-fan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1898573205

   Note that until we adopt some approach to automate the generation of config tables in our docs (e.g. like in #44755 or #44756), you will need to manually update the HTML here so it stays in sync with the source: https://github.com/apache/spark/blob/977f64f0904e46e72cbe5b2252f2657dde29c90c/docs/sql-performance-tuning.md#L270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on code in PR #44787:
URL: https://github.com/apache/spark/pull/44787#discussion_r1473974633


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44787:
URL: https://github.com/apache/spark/pull/44787#discussion_r1473823018


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +

Review Comment:
   maybe just say `It's recommended to set this config to true on a busy cluster to male resource utilization more efficient (not many small tasks).`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +

Review Comment:
   maybe just say `It's recommended to set this config to true on a busy cluster to make resource utilization more efficient (not many small tasks).`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1902560357

   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "srowen (via GitHub)" <gi...@apache.org>.

srowen commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1925352784

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.

beliefer commented on PR #44787:
URL: https://github.com/apache/spark/pull/44787#issuecomment-1907922040

   cc @MaxGekk @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.

cloud-fan commented on code in PR #44787:
URL: https://github.com/apache/spark/pull/44787#discussion_r1464932242


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +

Review Comment:
   @maryannxue is it really recommended?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46760][SQL][DOCS] Make the document of spark.sql.adaptive.coalescePartitions.parallelismFirst clearer [spark]

Posted by "eejbyfeldt (via GitHub)" <gi...@apache.org>.

eejbyfeldt commented on code in PR #44787:
URL: https://github.com/apache/spark/pull/44787#discussion_r1517728906


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -721,8 +721,8 @@ object SQLConf {
         "shuffle partitions, but adaptively calculate the target size according to the default " +
         "parallelism of the Spark cluster. The calculated size is usually smaller than the " +
         "configured target size. This is to maximize the parallelism and avoid performance " +
-        "regression when enabling adaptive query execution. It's recommended to set this config " +

Review Comment:
   This suggestion contains a mistake right? It should be set to false in a busy cluster? https://github.com/apache/spark/pull/45437



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org