You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/25 16:54:58 UTC

[GitHub] [spark] tanelk opened a new pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

tanelk opened a new pull request #29871:
URL: https://github.com/apache/spark/pull/29871


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Mark the "Join Reorder" batch as idempotent.
   The CostBasedJoinReorder needed a slight modification to satisfy this rule.
   
   There is a large amount of changes in the plan stability suite. They are caused by sorting the plans in CostBasedJoinReorder - the left and right side can be switched. 
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Idempotence check on optimizer batches is an invaluable for developers. It can avoid us bugs in the future.  
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Existing UTs cover it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #29871:
URL: https://github.com/apache/spark/pull/29871


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812461242






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754566870


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38247/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687066


   > If there are multiple candidate plans with same cost, I think `different inputs produce different outputs` is reasonableļ¼Œso only need ensure the same behavior in Scala 2.12 and Scala 2.13
   
   With this change optimally ordered input doesn't get reordered into another plan with the same cost - this makes `CostBasedJoinReorder` idempotent, that can help catch bugs. 
   
   I had a wrong approach to solving this earlier, now it should stable. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700495615


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699076012


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/33737/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700444719






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687224






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761620279


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38731/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752777134


   **[Test build #133546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133546/testReport)** for PR 29871 at commit [`8bd1c21`](https://github.com/apache/spark/commit/8bd1c215fbdb89722f5afc9ff3276415f9dbc340).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812512468


   **[Test build #136850 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136850/testReport)** for PR 29871 at commit [`d2a6128`](https://github.com/apache/spark/commit/d2a6128e977949444432f7da6935385a54b5301f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496097705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       This was also my first idea, but I couldn't figure it out.
   
   I'll try to give two examples, where all pairwise joins have the same cost and we can join only plans with consecutive letters (AB, but bot AC)
   Keep in mind, that we try build left-deep trees:
   https://github.com/apache/spark/blob/4619acc3ce72a6df9ac849d045ecdbed56a562be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L301
   
   Let the inital join be ((AB)C) - firstly join A and B and then join with C.
   
   Firstly if we kept the first canditate 
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is single element, inner is two element):
   ((BC)A)
   Disgarded ((AB)C) because it was late
   
   Secondly if we kept the last one
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (BA)  (CB)   
   Disgarded (AB) and (BC), because they were overwritten
   
   3. iteration (outer loop is single element, inner is two element):
   ((BA)C)
   Disgarded ((CB)A), because it was overwritten
   
   Neither of these gave the original result. But lets try iterating from the largest (while keeping the first candidate)
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   The initial state
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is two element, inner is single element):
   ((AB)C)
   Disgarded ((BC)A) because it was late
   
   Now, if we would add another plan D, then in the 4. iteration it would be appended to the ((AB)C) and we would still have correct result.
   
   Note that if the input would be ((AB)(CD)), then the result would also be (((AB)C)D), but we must be stable only on second iteration and because we output left-deep trees, then we must be stable on left-deep trees, to be idempotent. 
   
   Also by left-deep we mean "as left-deep" as possible. 
   
   And finally, yes, I know that this dummy example is no rigorous proof, but we have 2 things going for us:
   1. Out of all the test cases we have, none broke this rule
   2. The idempotentsi test is used only in UTs, so there is no chance of breaking the user side by marking this as  idempotent. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk closed pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk closed pull request #29871:
URL: https://github.com/apache/spark/pull/29871


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693778


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767707166


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134516/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754448373


   **[Test build #133658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133658/testReport)** for PR 29871 at commit [`6248e4a`](https://github.com/apache/spark/commit/6248e4aefa8790d0c31aae450d0353ee9bd2d87e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699862311


   > Does this PR affect the behavior in Scala 2.13? Can you help verify it offline?
   
   I ran 
   ```
   > dev/change-scala-version.sh 2.13
   > mvn clean install -DskipTests -pl sql/catalyst -Pscala-2.13 -am
   > mvn test -pl sql/catalyst -Pscala-2.13 -fn
   > mvn clean install -DskipTests -pl sql/core -Pscala-2.13 -am
   > mvn test -pl sql/core -Pscala-2.13 -fn
   ```
   
   And the output is 
   ```
   Run completed in 10 minutes, 43 seconds.
   Total number of tests run: 4381
   Suites: completed 248, aborted 0
   Tests: succeeded 4381, failed 0, canceled 0, ignored 6, pending 0
   All tests passed.
   
   Run completed in 1 hour, 5 minutes, 8 seconds.
   Total number of tests run: 8541
   Suites: completed 359, aborted 0
   Tests: succeeded 8540, failed 1, canceled 1, ignored 52, pending 0
   *** 1 TEST FAILED ***
   ```
   
   The failing test is `SPARK-32459: UDF should not fail on WrappedArray` and does not seem to be related to my changes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767773601


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699139100


   **[Test build #129118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129118/testReport)** for PR 29871 at commit [`7a44562`](https://github.com/apache/spark/commit/7a44562d659e0f3f2cf083d68614ad1dba1a6916).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496641418



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > But, what made it even worse is that the value for semanticHash will change every time you restart the JVM. I'm not sure if its intended or not and I could not find the source of randomness.
   
   Yea, that's a design and why do we need to keep the same `semanticHash` values between different JVMs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699073498


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/33736/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754711658


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38274/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495116397



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -40,30 +40,35 @@ object CostBasedJoinReorder extends Rule[LogicalPlan] with PredicateHelper {
     if (!conf.cboEnabled || !conf.joinReorderEnabled) {
       plan
     } else {
-      val result = plan transformDown {
-        // Start reordering with a joinable item, which is an InnerLike join with conditions.
-        // Avoid reordering if a join hint is present.
-        case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
-          reorder(j, j.output)
-        case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
-          if projectList.forall(_.isInstanceOf[Attribute]) =>
-          reorder(p, p.output)
-      }
+      val result = plan transformDown applyLocally
       // After reordering is finished, convert OrderedJoin back to Join.
       result transform {
         case OrderedJoin(left, right, jt, cond) => Join(left, right, jt, cond, JoinHint.NONE)
       }
     }
   }
 
+  private val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {
+    // Start reordering with a joinable item, which is an InnerLike join with conditions.
+    // Avoid reordering if a join hint is present.
+    case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
+      reorder(j, j.output)
+    case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
+      if projectList.forall(_.isInstanceOf[Attribute]) =>
+      reorder(p, p.output)
+
+  }
+
   private def reorder(plan: LogicalPlan, output: Seq[Attribute]): LogicalPlan = {
     val (items, conditions) = extractInnerJoins(plan)
     val result =
       // Do reordering if the number of items is appropriate and join conditions exist.
       // We also need to check if costs of all items can be evaluated.
       if (items.size > 2 && items.size <= conf.joinReorderDPThreshold && conditions.nonEmpty &&
           items.forall(_.stats.rowCount.isDefined)) {
-        JoinReorderDP.search(conf, items, conditions, output)
+        // Transform and sort all the items to guarantee idempotence
+        val transformedItems = items.map(_ transformDown applyLocally).sortBy(_.semanticHash())
+        JoinReorderDP.search(conf, transformedItems, conditions, output)

Review comment:
       This is the most relevant change - previously we reordered the joins from top to bottom. Now we reorder the joins in the lower groups first and then reorder the ones on the top.  Also sort them before reordering joins.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496352591



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       To my surprise the `semanticHash` changes between executions, so sorting by it makes the results of the `PlanStabilitySuite` not stable. 
   In the second commit of this PR I tried the sorting, but could not get it to work: https://github.com/apache/spark/pull/29871/commits/d58966ce5434d50044e8d610978f075cbcad92b1
   
   TBH It wasn't my first idea, but it is the first one, I got to work - the other ones seemed to require too much code change and I did not go further with those.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699049150


   **[Test build #129118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129118/testReport)** for PR 29871 at commit [`7a44562`](https://github.com/apache/spark/commit/7a44562d659e0f3f2cf083d68614ad1dba1a6916).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496097705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       This was also my first idea, but I couldn't figure it out.
   
   I'll try to give two examples, where all pairwise joins have the same cost and we can join only plans with consecutive letters (AB, but bot AC)
   Keep in mind, that we try build left-deep trees:
   https://github.com/apache/spark/blob/4619acc3ce72a6df9ac849d045ecdbed56a562be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L301
   
   Let the inital join be ((AB)C) - firstly join A and B and then join with C.
   Lets look at the candidates generated at each iteration: 
   
   Firstly if we kept the first candidate (current code)
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is single element, inner is two element):
   ((BC)A)
   Disgarded ((AB)C) because it was late
   
   Secondly if we kept the last one (possible change, that I considered)
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (BA)  (CB)   
   Disgarded (AB) and (BC), because they were overwritten
   
   3. iteration (outer loop is single element, inner is two element):
   ((BA)C)
   Disgarded ((CB)A), because it was overwritten
   
   Neither of these gave the original result. But lets try iterating from the largest (while keeping the first candidate). This is my proposition.
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   The initial state
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is two element, inner is single element):
   ((AB)C)
   Disgarded ((BC)A) because it was late
   
   Now, if we would add another plan D, then in the 4. iteration it would be appended to the ((AB)C) and we would still have correct result.
   
   Note that if the input would be ((AB)(CD)), then the result would also be (((AB)C)D), but we must be stable only on second iteration and because we output left-deep trees, then we must be stable on left-deep trees, to be idempotent. 
   
   Also by left-deep we mean "as left-deep" as possible. 
   
   And finally, yes, I know that this dummy example is no rigorous proof, but we have 2 things going for us:
   1. Out of all the test cases we have, none broke this rule
   2. The idempotentsi test is used only in UTs, so there is no chance of breaking the user side by marking this as  idempotent. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699073490






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767868477


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496097705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       This was also my first idea, but I couldn't figure it out.
   
   I'll try to give two examples, where all pairwise joins have the same cost and we can join only plans with consecutive letters (AB, but bot AC)
   Keep in mind, that we try build left-deep trees:
   https://github.com/apache/spark/blob/4619acc3ce72a6df9ac849d045ecdbed56a562be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L301
   
   Let the inital join be ((AB)C)
   
   Firstly if we kept the first canditate 
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is single element, inner is two element):
   ((BC)A)
   Disgarded ((AB)C) because it was late
   
   Secondly if we kept the last one
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (BA)  (CB)   
   Disgarded (AB) and (BC), because they were overwritten
   
   3. iteration (outer loop is single element, inner is two element):
   ((BA)C)
   Disgarded ((CB)A), because it was overwritten
   
   Neither of these gave the original result. But lets try iterating from the largest (while keeping the first candidate)
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   The initial state
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is two element, inner is single element):
   ((AB)C)
   Disgarded ((BC)A) because it was late
   
   Now, if we would add another plan D, then in the 4. iteration it would be appended to the ((AB)C) and we would still have correct result.
   
   Note that if the input would be ((AB)(CD)), then the result would also be (((AB)C)D), but we must be stable only on second iteration and because we output left-deep trees, then we must be stable on left-deep trees, to be idempotent. 
   
   Also by left-deep we mean "as left-deep" as possible. 
   
   And finally, yes, I know that this dummy example is no rigorous proof, but we have 2 things going for us:
   1. Out of all the test cases we have, none broke this rule
   2. The idempotentsi test is used only in UTs, so there is no chance of breaking the user side by marking this as  idempotent. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495116397



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -40,30 +40,35 @@ object CostBasedJoinReorder extends Rule[LogicalPlan] with PredicateHelper {
     if (!conf.cboEnabled || !conf.joinReorderEnabled) {
       plan
     } else {
-      val result = plan transformDown {
-        // Start reordering with a joinable item, which is an InnerLike join with conditions.
-        // Avoid reordering if a join hint is present.
-        case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
-          reorder(j, j.output)
-        case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
-          if projectList.forall(_.isInstanceOf[Attribute]) =>
-          reorder(p, p.output)
-      }
+      val result = plan transformDown applyLocally
       // After reordering is finished, convert OrderedJoin back to Join.
       result transform {
         case OrderedJoin(left, right, jt, cond) => Join(left, right, jt, cond, JoinHint.NONE)
       }
     }
   }
 
+  private val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {
+    // Start reordering with a joinable item, which is an InnerLike join with conditions.
+    // Avoid reordering if a join hint is present.
+    case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
+      reorder(j, j.output)
+    case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
+      if projectList.forall(_.isInstanceOf[Attribute]) =>
+      reorder(p, p.output)
+
+  }
+
   private def reorder(plan: LogicalPlan, output: Seq[Attribute]): LogicalPlan = {
     val (items, conditions) = extractInnerJoins(plan)
     val result =
       // Do reordering if the number of items is appropriate and join conditions exist.
       // We also need to check if costs of all items can be evaluated.
       if (items.size > 2 && items.size <= conf.joinReorderDPThreshold && conditions.nonEmpty &&
           items.forall(_.stats.rowCount.isDefined)) {
-        JoinReorderDP.search(conf, items, conditions, output)
+        // Transform and sort all the items to guarantee idempotence
+        val transformedItems = items.map(_ transformDown applyLocally).sortBy(_.semanticHash())
+        JoinReorderDP.search(conf, transformedItems, conditions, output)

Review comment:
       This is the most relevant change - previously we reordered the joins from top to bottom. Now we reorder the joins in the lower groups first and then reorder the ones on the top.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687942


   **[Test build #129155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129155/testReport)** for PR 29871 at commit [`4619acc`](https://github.com/apache/spark/commit/4619acc3ce72a6df9ac849d045ecdbed56a562be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700431912


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33827/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699862739


   > Have you checked actuall performance changes with sf=100? I think it would be better to check them for avoiding accidental performance regression.
   
   I have not checked this. It might take me some time, but I'll try to figure it out. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700444652


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33827/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803269636


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496647197



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > Yea, that's a design and why do we need to keep the same `semanticHash` values between different JVMs?
   
   It does not allow us sort by it and then compare results in the `PlanStbilitySuite`. 

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > Yea, that's a design and why do we need to keep the same `semanticHash` values between different JVMs?
   
   It does not allow us sort by it and then compare results in the `PlanStabilitySuite`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812515788


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752785107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38137/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699574633


   If there are multiple candidate plans with the same cost, I think `different inputs produce different outputs` is reasonableļ¼Œso only need ensure the same behavior in Scala 2.12 and Scala 2.13


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699064901


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33737/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761616171


   **[Test build #134148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134148/testReport)** for PR 29871 at commit [`f7744db`](https://github.com/apache/spark/commit/f7744dbb007ebb9cfb4298de7f0aa814e702dc8b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767741523


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767868477


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134517/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496653446



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       Ah, I see. I only cared about the idempotence check in `RuleExecutor`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752777134


   **[Test build #133546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133546/testReport)** for PR 29871 at commit [`8bd1c21`](https://github.com/apache/spark/commit/8bd1c215fbdb89722f5afc9ff3276415f9dbc340).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767738547


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699049464


   @LuciferYang, it seems you were working on similar issue, but you closed it - #29638


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700419832


   **[Test build #129212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129212/testReport)** for PR 29871 at commit [`ee944e9`](https://github.com/apache/spark/commit/ee944e9b4df32eb96de1343787c990bc9c0c7cef).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699049150


   **[Test build #129118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129118/testReport)** for PR 29871 at commit [`7a44562`](https://github.com/apache/spark/commit/7a44562d659e0f3f2cf083d68614ad1dba1a6916).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495614536



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -40,30 +40,35 @@ object CostBasedJoinReorder extends Rule[LogicalPlan] with PredicateHelper {
     if (!conf.cboEnabled || !conf.joinReorderEnabled) {
       plan
     } else {
-      val result = plan transformDown {
-        // Start reordering with a joinable item, which is an InnerLike join with conditions.
-        // Avoid reordering if a join hint is present.
-        case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
-          reorder(j, j.output)
-        case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
-          if projectList.forall(_.isInstanceOf[Attribute]) =>
-          reorder(p, p.output)
-      }
+      val result = plan transformDown applyLocally
       // After reordering is finished, convert OrderedJoin back to Join.
       result transform {
         case OrderedJoin(left, right, jt, cond) => Join(left, right, jt, cond, JoinHint.NONE)
       }
     }
   }
 
+  private val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {
+    // Start reordering with a joinable item, which is an InnerLike join with conditions.
+    // Avoid reordering if a join hint is present.
+    case j @ Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE) =>
+      reorder(j, j.output)
+    case p @ Project(projectList, Join(_, _, _: InnerLike, Some(cond), JoinHint.NONE))
+      if projectList.forall(_.isInstanceOf[Attribute]) =>
+      reorder(p, p.output)
+
+  }
+
   private def reorder(plan: LogicalPlan, output: Seq[Attribute]): LogicalPlan = {
     val (items, conditions) = extractInnerJoins(plan)
     val result =
       // Do reordering if the number of items is appropriate and join conditions exist.
       // We also need to check if costs of all items can be evaluated.
       if (items.size > 2 && items.size <= conf.joinReorderDPThreshold && conditions.nonEmpty &&
           items.forall(_.stats.rowCount.isDefined)) {
-        JoinReorderDP.search(conf, items, conditions, output)
+        // Transform and sort all the items to guarantee idempotence
+        val transformedItems = items.map(_ transformDown applyLocally).sortBy(_.semanticHash())
+        JoinReorderDP.search(conf, transformedItems, conditions, output)

Review comment:
       This is outdated

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -342,7 +347,10 @@ object JoinReorderDP extends PredicateHelper with Logging {
     /** Get the cost of the root node of this plan tree. */
     def rootCost(conf: SQLConf): Cost = {
       if (itemIds.size > 1) {
-        val rootStats = plan.stats
+        val rootStats = (plan transform {
+          // The OrderedJoin does not propagate stats
+          case OrderedJoin(left, right, jt, cond) => Join(left, right, jt, cond, JoinHint.NONE)
+        }).stats

Review comment:
       This is outdated




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803297860


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136280/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752871505


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133546/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752871505


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133546/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754641813


   Now that #30965 is merged, this changes the plans a lot less.
   But there is still idempotency issue with plans with equal costs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754657405


   **[Test build #133685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133685/testReport)** for PR 29871 at commit [`f815feb`](https://github.com/apache/spark/commit/f815feb15237049724d7033e7242577adbbab7b8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812397139


   **[Test build #136850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136850/testReport)** for PR 29871 at commit [`d2a6128`](https://github.com/apache/spark/commit/d2a6128e977949444432f7da6935385a54b5301f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495896705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev

Review comment:
       cc: @wzhfy




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803297860


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136280/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699574633


   If there are multiple candidate plans with same cost, I think `different inputs produce different outputs` is reasonableļ¼Œso only need ensure the same behavior in Scala 2.12 and Scala 2.13


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699139514


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496097705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       This was also my first idea, but I couldn't figure it out.
   
   I'll try to give two examples, where all pairwise joins have the same cost and we can join only plans with consecutive letters (AB, but bot AC)
   Keep in mind, that we try build left-deep trees:
   https://github.com/apache/spark/blob/4619acc3ce72a6df9ac849d045ecdbed56a562be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L301
   
   Let the inital join be ((AB)C) - firstly join A and B and then join with C.
   Lets look at the candidates generated at each iteration: 
   
   Firstly if we kept the first candidate (current code)
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is single element, inner is two element):
   ((BC)A)
   Disgarded ((AB)C) because it was late
   
   Secondly if we kept the last one (possible change, that I considered)
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (BA)  (CB)   
   Disgarded (AB) and (BC), because they were overwritten
   
   3. iteration (outer loop is single element, inner is two element):
   ((BA)C)
   Disgarded ((CB)A), because it was overwritten
   
   Neither of these gave the original result. But lets try iterating from the largest (while keeping the first candidate). This is my proposition.
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   The initial state
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is two element, inner is single element):
   ((AB)C)
   Disgarded ((BC)A) because it was late
   
   Now, if we would add another plan D, then in the 4. iteration it would be appended to the ((AB)C) and we would still have correct result.
   
   Note that if the input would be ((AB)(CD)), then the result would also be (((AB)C)D), but we must be stable only on second time this is applied and because we output left-deep trees, then we must be stable on left-deep trees, to be idempotent. 
   
   Also by left-deep we mean "as left-deep" as possible. 
   
   And finally, yes, I know that this dummy example is no rigorous proof, but we have 2 things going for us:
   1. Out of all the test cases we have, none broke this rule
   2. The idempotentsi test is used only in UTs, so there is no chance of breaking the user side by marking this as  idempotent. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693778






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699723236






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496538324



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > To my surprise the semanticHash changes between executions, so sorting by it makes the results of the PlanStabilitySuite not stable.
   
   Ah, I see... This rule rewrites candidate plans after reordering plans, so IIUC `semanticHash` values might change in the second run...
   
   > TBH It wasn't my first idea, but it is the first one, I got to work - the other ones seemed to require too much code change and I did not go further with those.
   
   okay, let's wait for the comments of other reviewers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693782






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700495320


   **[Test build #129212 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129212/testReport)** for PR 29871 at commit [`ee944e9`](https://github.com/apache/spark/commit/ee944e9b4df32eb96de1343787c990bc9c0c7cef).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803275029


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699073490


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761699365


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761623784


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38731/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761698635


   **[Test build #134148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134148/testReport)** for PR 29871 at commit [`f7744db`](https://github.com/apache/spark/commit/f7744dbb007ebb9cfb4298de7f0aa814e702dc8b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699691984


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33770/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754622829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133658/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699115679


   Closing for now, because there is some issue with the golden files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693880


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/33770/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754689968


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38274/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752865812


   **[Test build #133546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133546/testReport)** for PR 29871 at commit [`8bd1c21`](https://github.com/apache/spark/commit/8bd1c215fbdb89722f5afc9ff3276415f9dbc340).
    * This patch **fails from timeout after a configured wait of `500m`**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803294439


   **[Test build #136280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136280/testReport)** for PR 29871 at commit [`d6522d9`](https://github.com/apache/spark/commit/d6522d91c7d8c35f9f554e123ee5b44cc15dec0d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699723236






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496352591



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       To my surprise the `semanticHash` changes between executions, so sorting by it makes the results of the `PlanStabilitySuite` not stable. 
   In the second commit of this PR I tried the sorting, but could not get it to work: https://github.com/apache/spark/pull/29871/commits/d58966ce5434d50044e8d610978f075cbcad92b1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496321773



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       Btw, could you update the PR description based on the explanation above? That explanation looks useful for making others understood smoothly, I think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754448373


   **[Test build #133658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133658/testReport)** for PR 29871 at commit [`6248e4a`](https://github.com/apache/spark/commit/6248e4aefa8790d0c31aae450d0353ee9bd2d87e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767707166


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134516/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761644395


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38731/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767731671


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699041286


   **[Test build #129116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129116/testReport)** for PR 29871 at commit [`d58966c`](https://github.com/apache/spark/commit/d58966ce5434d50044e8d610978f075cbcad92b1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699154323


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129116/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754836103


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133685/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754657405


   **[Test build #133685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133685/testReport)** for PR 29871 at commit [`f815feb`](https://github.com/apache/spark/commit/f815feb15237049724d7033e7242577adbbab7b8).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495953275



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       Could you describe more about why we need to change this search order for making this rule idempotent? If how to update next-level join plans is a root cause, we cannot simply modify the update logic below when different plans having the same cost found?
   
   https://github.com/apache/spark/blob/d15f504a5e8bd8acfb6dc1ee138f7d92ff211396/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L227-L233




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761644395


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38731/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803262863


   **[Test build #136280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136280/testReport)** for PR 29871 at commit [`d6522d9`](https://github.com/apache/spark/commit/d6522d91c7d8c35f9f554e123ee5b44cc15dec0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693874


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33770/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699805838


   Have you checked actuall performance changes with sf=100? I think it would be better to check them for avoiding accidental performance regression.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699722845


   **[Test build #129155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129155/testReport)** for PR 29871 at commit [`4619acc`](https://github.com/apache/spark/commit/4619acc3ce72a6df9ac849d045ecdbed56a562be).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700444694






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700419832


   **[Test build #129212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129212/testReport)** for PR 29871 at commit [`ee944e9`](https://github.com/apache/spark/commit/ee944e9b4df32eb96de1343787c990bc9c0c7cef).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699153688


   **[Test build #129116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129116/testReport)** for PR 29871 at commit [`d58966c`](https://github.com/apache/spark/commit/d58966ce5434d50044e8d610978f075cbcad92b1).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699154315


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803262863


   **[Test build #136280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136280/testReport)** for PR 29871 at commit [`d6522d9`](https://github.com/apache/spark/commit/d6522d91c7d8c35f9f554e123ee5b44cc15dec0d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754622829


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/133658/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767741523


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699076007






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700495615






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767773601


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752785107


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38137/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699075979


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33737/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752782826


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38137/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699139521


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129118/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761616171


   **[Test build #134148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134148/testReport)** for PR 29871 at commit [`f7744db`](https://github.com/apache/spark/commit/f7744dbb007ebb9cfb4298de7f0aa814e702dc8b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495117673



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -342,7 +347,10 @@ object JoinReorderDP extends PredicateHelper with Logging {
     /** Get the cost of the root node of this plan tree. */
     def rootCost(conf: SQLConf): Cost = {
       if (itemIds.size > 1) {
-        val rootStats = plan.stats
+        val rootStats = (plan transform {
+          // The OrderedJoin does not propagate stats
+          case OrderedJoin(left, right, jt, cond) => Join(left, right, jt, cond, JoinHint.NONE)
+        }).stats

Review comment:
       This is an unfortunate consequence of starting the reordering from the bottom. If anybody has suggestions on how to avoid this, I would happily hear you. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693773


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33769/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699041286


   **[Test build #129116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129116/testReport)** for PR 29871 at commit [`d58966c`](https://github.com/apache/spark/commit/d58966ce5434d50044e8d610978f075cbcad92b1).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687104


   **[Test build #129154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129154/testReport)** for PR 29871 at commit [`c9028ad`](https://github.com/apache/spark/commit/c9028ad4bf0fabfac5deeb87b13b610c1ff13904).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687222


   **[Test build #129154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129154/testReport)** for PR 29871 at commit [`c9028ad`](https://github.com/apache/spark/commit/c9028ad4bf0fabfac5deeb87b13b610c1ff13904).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754729291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38274/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754729291


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38274/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699693879






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767741485


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39102/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812515788


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496321413



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       Thanks for your detailed explanation. IMO a fix should be simple and intuitive in terms of maintainability. Another idea that I came up with after I read the description above was to simply sort input candidate plans by some values (e.g., `semanticHash`) at the beginning of `search`. It seems there are multiple options to solve this (like the four factors you described above). Any reason to choose the current approach out from them? You chose the simplest one?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754566870


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/38247/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687104


   **[Test build #129154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129154/testReport)** for PR 29871 at commit [`c9028ad`](https://github.com/apache/spark/commit/c9028ad4bf0fabfac5deeb87b13b610c1ff13904).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-754599443


   **[Test build #133658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133658/testReport)** for PR 29871 at commit [`6248e4a`](https://github.com/apache/spark/commit/6248e4aefa8790d0c31aae450d0353ee9bd2d87e).
    * This patch passes all tests.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812461275


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41428/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-761699365


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/134148/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699073468


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33736/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803275029


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496131422



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       I pondered about this a bit more. I think this is a bit more general explanaiton:
   
   After the first round of optimisation the plans are ordered: ABCD....
   Lets ignore the brackets to keep it general.
   On the second run we just have to redraw the brackets without changing the order.
   
   We know that the result is as left-deep as possible. 
   
   So, if we would be iterating from the smaller ones, then we could have (K) and (LM), that we want to join, but because it has to be left-deep, we must order it as ((LM)K).
   
   But, if we start from the bigger ones, then we hit (KL) and (M) first and joining these will not reorder the letters.
   
   For stability I think, that there are at least 4 factors, that must match with each others:
   How do we join: left-deep, right-deep....
   The iteration order in one "level": left-to-right, right-to-left.
   The order of picking levels: bigger first, smaller first.
   Which one we keep, when joins have equal cost: first, last...
   
   There might be more. Changing the order of picking levels gets us to stability in one change.
   There are most definitely more ways to achieve stability, but it seems, that it would require more than one change.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699741822


   Does this PR affect the behavior in Scala 2.13? Can you help verify it offline?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r495953275



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       Could you describe more about why we need to change this search ~order~ strategy for making this rule idempotent? If how to update next-level join plans is a root cause, we cannot simply modify the update logic below when different plans having the same cost found?
   
   https://github.com/apache/spark/blob/d15f504a5e8bd8acfb6dc1ee138f7d92ff211396/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L227-L233




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767709780


   **[Test build #134517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134517/testReport)** for PR 29871 at commit [`5fe5f3b`](https://github.com/apache/spark/commit/5fe5f3b5ca159802ed535967dd29e01a7688ceb4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687224






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699691612


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33769/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767709780


   **[Test build #134517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134517/testReport)** for PR 29871 at commit [`5fe5f3b`](https://github.com/apache/spark/commit/5fe5f3b5ca159802ed535967dd29e01a7688ceb4).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767751351


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39101/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496579193



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       > Ah, I see... This rule rewrites candidate plans after reordering plans, so IIUC `semanticHash` values might change in the second run...
   > 
   
   Yeah, that was one issue, that complicated the approach.
   
   But, what made it even worse is that the value for `semanticHash` will change every time you restart the JVM. I'm not sure if its intended or not and I could not find the source of randomness. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496322850



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       One more comment; could you add some tests based on the explanation scenario above?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699154315






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812397139


   **[Test build #136850 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136850/testReport)** for PR 29871 at commit [`d2a6128`](https://github.com/apache/spark/commit/d2a6128e977949444432f7da6935385a54b5301f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699904879


   > The failing test is SPARK-32459: UDF should not fail on WrappedArray and does not seem to be related to my changes.
   
   Ok ~ I will try to find which issue break this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-767853121


   **[Test build #134517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/134517/testReport)** for PR 29871 at commit [`5fe5f3b`](https://github.com/apache/spark/commit/5fe5f3b5ca159802ed535967dd29e01a7688ceb4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699076007


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-803270742


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40862/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699061494


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33736/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-700495620


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129212/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #29871:
URL: https://github.com/apache/spark/pull/29871#discussion_r496097705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -206,13 +206,15 @@ object JoinReorderDP extends PredicateHelper with Logging {
       filters: Option[JoinGraphInfo]): JoinPlanMap = {
 
     val nextLevel = new JoinPlanMap
-    var k = 0
     val lev = existingLevels.length - 1
+    var k = lev
     // Build plans for the next level from plans at level k (one side of the join) and level
     // lev - k (the other side of the join).
-    // For the lower level k, we only need to search from 0 to lev - k, because when building
+    // For the higher level k, we only need to search from lev to lev - k, because when building
     // a join from A and B, both A J B and B J A are handled.
-    while (k <= lev - k) {
+    // Start searching from highest level to make sure that optimally ordered input doesn't get
+    // reordered into another plan with the same cost.

Review comment:
       This was also my first idea, but I couldn't figure it out.
   
   I'll try to give two examples, where all pairwise joins have the same cost and we can join only plans with consecutive letters (AB, but bot AC)
   Keep in mind, that we try build left-deep trees:
   https://github.com/apache/spark/blob/4619acc3ce72a6df9ac849d045ecdbed56a562be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L301
   
   Let the inital join be ((AB)C) - firstly join A and B and then join with C.
   Lets look at the candidates generated at each iteration: 
   
   Firstly if we kept the first canditate 
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is single element, inner is two element):
   ((BC)A)
   Disgarded ((AB)C) because it was late
   
   Secondly if we kept the last one
   1. iteration (the initial state):
   (A)  (B)  (C)   
   
   2. iteration (nested loop over the single elements):
   (BA)  (CB)   
   Disgarded (AB) and (BC), because they were overwritten
   
   3. iteration (outer loop is single element, inner is two element):
   ((BA)C)
   Disgarded ((CB)A), because it was overwritten
   
   Neither of these gave the original result. But lets try iterating from the largest (while keeping the first candidate)
   
   1. iteration (the initial state):
   (A)  (B)  (C)   
   The initial state
   
   2. iteration (nested loop over the single elements):
   (AB)  (BC)   
   Disgarded (BA) and (CB), because they were late
   
   3. iteration (outer loop is two element, inner is single element):
   ((AB)C)
   Disgarded ((BC)A) because it was late
   
   Now, if we would add another plan D, then in the 4. iteration it would be appended to the ((AB)C) and we would still have correct result.
   
   Note that if the input would be ((AB)(CD)), then the result would also be (((AB)C)D), but we must be stable only on second iteration and because we output left-deep trees, then we must be stable on left-deep trees, to be idempotent. 
   
   Also by left-deep we mean "as left-deep" as possible. 
   
   And finally, yes, I know that this dummy example is no rigorous proof, but we have 2 things going for us:
   1. Out of all the test cases we have, none broke this rule
   2. The idempotentsi test is used only in UTs, so there is no chance of breaking the user side by marking this as  idempotent. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699139514






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-877884289


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-752783641


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38137/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-812461275


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41428/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29871: [SPARK-32995][SQL] CostBasedJoinReorder optimizer rule should be idempotent

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29871:
URL: https://github.com/apache/spark/pull/29871#issuecomment-699687942


   **[Test build #129155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129155/testReport)** for PR 29871 at commit [`4619acc`](https://github.com/apache/spark/commit/4619acc3ce72a6df9ac849d045ecdbed56a562be).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org