You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "LuciferYang (via GitHub)" <gi...@apache.org> on 2023/09/27 17:39:29 UTC

[GitHub] [spark] LuciferYang opened a new pull request, #43155: [SPARK-45357][CONNECT][TESTS] Ignore `dataframeId` when comparing CollectMetrics in `SparkConnectProtoSuite`.

LuciferYang opened a new pull request, #43155:
URL: https://github.com/apache/spark/pull/43155

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746251416

   The GA failure is unrelated to the current pr:
   
   ```
   starting mypy annotations test...
   annotations failed mypy checks:
   /usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/tensor.py:369: error: INTERNAL ERROR -- Please try using mypy master on GitHub:
   https://mypy.readthedocs.io/en/stable/common_issues.html#using-a-development-mypy-build
   If this issue continues with mypy master, please report a bug at https://github.com/python/mypy/issues
   version: 0.982
   /usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/tensor.py:369: : note: please use --show-traceback to print a traceback when reporting a bug
   2
   Error: Process completed with exit code 2.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1738367041

   cc @cloud-fan @zhengruifeng 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng closed pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`
URL: https://github.com/apache/spark/pull/43155


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1340316526


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1068,6 +1068,10 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
     val connectAnalyzed = analyzePlan(transform(connectPlan))
-    comparePlans(connectAnalyzed, sparkPlan, false)
+    (connectAnalyzed, sparkPlan) match {

Review Comment:
   Thanks for the clarification! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1339488349


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1067,7 +1067,11 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
 
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
+    def normalizeDataframeId(plan: LogicalPlan): LogicalPlan = plan match {

Review Comment:
   shall we use transform? why only top-level `CollectMetrics`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746273704

   @zhengruifeng Moreover, this test case is in the `connect-server` module, the function used by `connectTestRelation.observe` in the test is
   
   https://github.com/apache/spark/blob/5e6986b819821c4ea5e341217b6c901690d93e90/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala#L1089-L1111
   
   It seems that there is no `planId` here and the `observe` function on the client side has not been implemented yet.
   
   
   https://github.com/apache/spark/blob/5e6986b819821c4ea5e341217b6c901690d93e90/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala#L3286-L3288
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746279579

   rebase to fix python lint


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1339473976


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1067,7 +1067,11 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
 
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
+    def normalizeDataframeId(plan: LogicalPlan): LogicalPlan = plan match {

Review Comment:
   add a new small function



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1750269782

   Thanks @zhengruifeng @cloud-fan @amaliujia ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746211815

   In `PlanGenerationTestSuite`, the `planId` was reset before each test
   
   https://github.com/apache/spark/blob/4863dec91f1cd596e60b42f1cbef9f6df327e562/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala#L119-L121
   
   IIRC, the `dataframeId` in `CollectMetrics` is also the `planId`, so is it possible to simply reset the `planId` before problematic test suites?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1339473976


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1067,7 +1067,11 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
 
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
+    def normalizeDataframeId(plan: LogicalPlan): LogicalPlan = plan match {

Review Comment:
   add a new small function, is it ok?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1739491907

   QQ: why this was not caught before? Does the CI check here run tests in `connector/connect/server` at all?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] amaliujia commented on pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1739514235

   LGTM
   
   Another way to fix is manually construct the CollectMetrics to compare with the proto generated version. But this PR's approach is fine too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1339470161


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1068,6 +1068,10 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
     val connectAnalyzed = analyzePlan(transform(connectPlan))
-    comparePlans(connectAnalyzed, sparkPlan, false)
+    (connectAnalyzed, sparkPlan) match {
+      case (l: CollectMetrics, r: CollectMetrics) =>

Review Comment:
   why not just add a small normalize function to reset df id to 0 for all `CollectMetrics` in the query plan?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1750035503

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746330281

   also cc @hvanhovell 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Ignore `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`.

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1338979671


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1068,6 +1068,10 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
     val connectAnalyzed = analyzePlan(transform(connectPlan))
-    comparePlans(connectAnalyzed, sparkPlan, false)
+    (connectAnalyzed, sparkPlan) match {

Review Comment:
   Since this is the first case, this pr only made a simple fix.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43155:
URL: https://github.com/apache/spark/pull/43155#discussion_r1339014960


##########
connector/connect/server/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##########
@@ -1068,6 +1068,10 @@ class SparkConnectProtoSuite extends PlanTest with SparkConnectPlanTest {
   // Compares proto plan with LogicalPlan.
   private def comparePlans(connectPlan: proto.Relation, sparkPlan: LogicalPlan): Unit = {
     val connectAnalyzed = analyzePlan(transform(connectPlan))
-    comparePlans(connectAnalyzed, sparkPlan, false)
+    (connectAnalyzed, sparkPlan) match {

Review Comment:
   In the current scenario, `connectAnalyzed` is transformed from `proto.Relation`. When it is `CollectMetrics`, the `dataframeId` is always 0.
   
   But if the `sparkPlan` is `CollectMetrics`, its `dataframeId` value is determined by its corresponding DataFrame.
   
   In the sbt test, `SparkConnectProtoSuite` is tested earlier, `sparkTestRelation` is the first created `DataFrame` with `id` as 0, thus the GA test didn't trigger the failure described in the pr.
   
   When using Maven for testing, `SparkConnectProtoSuite` is tested later, `sparkTestRelation` is not the first created `DataFrame` with `id` not being 0, thus causing the test to fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #43155: [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite`

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1738368044

   also cc @amaliujia for double check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45357][CONNECT][TESTS] Normalize `dataframeId` when comparing `CollectMetrics` in `SparkConnectProtoSuite` [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #43155:
URL: https://github.com/apache/spark/pull/43155#issuecomment-1746224877

   > In `PlanGenerationTestSuite`, the `planId` was reset before each test
   > 
   > https://github.com/apache/spark/blob/4863dec91f1cd596e60b42f1cbef9f6df327e562/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala#L119-L121
   > 
   > IIRC, the `dataframeId` in `CollectMetrics` is also the `planId`, so is it possible to simply reset the `planId` before problematic test suites?
   
   No, maybe it's not feasible. It's not just a matter of planId here. If we follow your argument, we would also need to add a new function for xx to reset `Dataset#curId`.
   
   
   https://github.com/apache/spark/blob/97597ba52842c6fcda5db27171439f9c02c1a782/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L70
   
   https://github.com/apache/spark/blob/97597ba52842c6fcda5db27171439f9c02c1a782/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L204
   
   https://github.com/apache/spark/blob/97597ba52842c6fcda5db27171439f9c02c1a782/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2219-L2222


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org