You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/11 17:22:56 UTC

[GitHub] [spark] grundprinzip opened a new pull request, #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

grundprinzip opened a new pull request, #38627:
URL: https://github.com/apache/spark/pull/38627

   ### What changes were proposed in this pull request?
   
   This is a follow-up improving the behavior and compatibility for aggregate relations using Spark Connect. Previously, Spark Connect would not retain the group by columns in the aggregation similar to very old Spark behavior that is by default not set.
   
   ### Why are the changes needed?
   Compatibility.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021019891


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   > we should support client side config?
   
   I think we should support it, I am looking into `__repr__` whose behavior depends on the config



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021122825


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -441,11 +441,14 @@ class SparkConnectPlanner(session: SparkSession) {
           case x => UnresolvedAlias(x)
         }
 
+    // Retain group columns in aggregate expressions:
+    val aggExprs =
+      groupingExprs ++ rel.getResultExpressionsList.asScala.map(transformResultExpression)

Review Comment:
   I don't have data points, but people can always add a `.select(...)` to exclude grouping columns. I'm fine to always include grouping columns here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #38627:
URL: https://github.com/apache/spark/pull/38627#issuecomment-1312661012

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021124144


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -441,11 +441,14 @@ class SparkConnectPlanner(session: SparkSession) {
           case x => UnresolvedAlias(x)
         }
 
+    // Retain group columns in aggregate expressions:
+    val aggExprs =
+      groupingExprs ++ rel.getResultExpressionsList.asScala.map(transformResultExpression)
+
     logical.Aggregate(
       child = transformRelation(rel.getInput),
       groupingExpressions = groupingExprs.toSeq,

Review Comment:
   not related to this PR, but `Aggregate.groupingExpressions` is `Seq[Expression]`, not `Seq[NamedExpression]`, we can use the original grouping expressions directly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] grundprinzip commented on pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

grundprinzip commented on PR #38627:
URL: https://github.com/apache/spark/pull/38627#issuecomment-1311969221

   R: @cloud-fan @amaliujia @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021019325


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   There are two questions:
   1. Do we want to support both? If not, maybe go with a documentation?
   2. if we support both ways, by what approach? Should that be a part of proto (stateless) or we should support client side config?
   
   I am trying to get these questions discussed a bit this week to close this chapter. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.
URL: https://github.com/apache/spark/pull/38627


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] grundprinzip commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

grundprinzip commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021117249


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -441,11 +441,14 @@ class SparkConnectPlanner(session: SparkSession) {
           case x => UnresolvedAlias(x)
         }
 
+    // Retain group columns in aggregate expressions:
+    val aggExprs =
+      groupingExprs ++ rel.getResultExpressionsList.asScala.map(transformResultExpression)

Review Comment:
   @cloud-fan from your experience how often have you seen someone to explicitly set the flag since the default was changed in Spark 2.x? I don't think it's worth it adding the full concept of session options and using this as the prime example. 
   
   Our goal should be to simplify, and the users can still override the defaults on the server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1020987396


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   `DATAFRAME_RETAIN_GROUP_COLUMNS` will take no effect in Connect, shoud we mention it somewhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1022752267


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -441,11 +441,14 @@ class SparkConnectPlanner(session: SparkSession) {
           case x => UnresolvedAlias(x)
         }
 
+    // Retain group columns in aggregate expressions:
+    val aggExprs =
+      groupingExprs ++ rel.getResultExpressionsList.asScala.map(transformResultExpression)
+
     logical.Aggregate(
       child = transformRelation(rel.getInput),
       groupingExpressions = groupingExprs.toSeq,

Review Comment:
   cc @amaliujia to followup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021065389


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   makes sense. Let me drive a consensus on this. The usage `__repr__` probably can vote +1 on the client side configuration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021067557


##########
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -441,11 +441,14 @@ class SparkConnectPlanner(session: SparkSession) {
           case x => UnresolvedAlias(x)
         }
 
+    // Retain group columns in aggregate expressions:
+    val aggExprs =
+      groupingExprs ++ rel.getResultExpressionsList.asScala.map(transformResultExpression)

Review Comment:
   To match DataFrame API, I think we should allow clients to choose the behavior. How about we add a bool flag in the proto message of `Aggregate`? Clients can either use config or expose APIs directly to set this bool flag.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #38627:
URL: https://github.com/apache/spark/pull/38627#issuecomment-1315270191

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] grundprinzip commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

grundprinzip commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021116483


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   There are couple of things to point out:
   
   1) The behavior can be controlled by applying the default values for the execution on the server side and I would like to hold adding knobs until we have real evidence that we need it.
   
   2) `__repr__` the behavior of REPL is defined by PySpark and we don't have to follow it to the letter. We can make independent decisions that define what the behavior on `__repr__` is. For example, I don't think we should eagerly evaluate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] amaliujia commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

amaliujia commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021019325


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   There are two questions:
   1. Do we want to support both? If not, maybe go with a documentation?
   2. if we support both ways, by what approach? Should that be a part of proto (for example, a boolean flag which keep client being stateless) or we should support client side config?
   
   I am trying to get these questions discussed a bit this week to close this chapter. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38627: [SPARK-40875] [CONNECT] [FOLLOW] Retain Group expressions in aggregate.

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on code in PR #38627:
URL: https://github.com/apache/spark/pull/38627#discussion_r1021019891


##########
connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -148,29 +147,25 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   }
 
   test("Aggregate with more than 1 grouping expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
-      val connectPlan =
-        connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
-      val sparkPlan =
-        sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
-      comparePlans(connectPlan, sparkPlan)
-    }
+    val connectPlan =
+      connectTestRelation.groupBy("id".protoAttr, "name".protoAttr)()
+    val sparkPlan =
+      sparkTestRelation.groupBy(Column("id"), Column("name")).agg(Map.empty[String, String])
+    comparePlans(connectPlan, sparkPlan)
   }
 
   test("Aggregate expressions") {
-    withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {

Review Comment:
   > we should support client side config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org