You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/13 23:21:40 UTC

[GitHub] [spark] c21 commented on a change in pull request #34270: [SPARK-37001][SQL] Disable two level of map for final hash aggregation by default

c21 commented on a change in pull request #34270:
URL: https://github.com/apache/spark/pull/34270#discussion_r728514936



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1705,11 +1705,21 @@ object SQLConf {
       .doc("Enable two-level aggregate hash map. When enabled, records will first be " +
         "inserted/looked-up at a 1st-level, small, fast map, and then fallback to a " +
         "2nd-level, larger, slower map when 1st level is full or keys cannot be found. " +
-        "When disabled, records go directly to the 2nd level.")
+        "When disabled, records go directly to the 2nd level. Enable for partial aggregate only.")
       .version("2.3.0")
       .booleanConf
       .createWithDefault(true)
 
+  val ENABLE_TWOLEVEL_FINAL_AGG_MAP =
+    buildConf("spark.sql.codegen.aggregate.final.map.twolevel.enabled")
+      .internal()
+      .doc("Enable two-level aggregate hash map for final aggregate as well. Disable by default " +
+        "because final aggregate might get more distinct keys compared to partial aggregate. " +
+        "Overhead of looking up 1st-level map might dominate when having a lot of distinct keys.")
+      .version("3.2.0")

Review comment:
       @cloud-fan - yes, updated.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1705,11 +1705,21 @@ object SQLConf {
       .doc("Enable two-level aggregate hash map. When enabled, records will first be " +
         "inserted/looked-up at a 1st-level, small, fast map, and then fallback to a " +
         "2nd-level, larger, slower map when 1st level is full or keys cannot be found. " +
-        "When disabled, records go directly to the 2nd level.")
+        "When disabled, records go directly to the 2nd level. Enable for partial aggregate only.")
       .version("2.3.0")
       .booleanConf
       .createWithDefault(true)
 
+  val ENABLE_TWOLEVEL_FINAL_AGG_MAP =
+    buildConf("spark.sql.codegen.aggregate.final.map.twolevel.enabled")

Review comment:
       @cloud-fan - sure, updated. So given the new meaning of config, changed the default config value to `true` as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org