You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/10 16:22:39 UTC

[GitHub] [spark] LuciferYang opened a new pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result in Scala 2.12 and 2.13

LuciferYang opened a new pull request #29718:
URL: https://github.com/apache/spark/pull/29718


   ### What changes were proposed in this pull request?
   
   The optimization result of `CostBasedJoinReorder` maybe different with same input in Scala 2.12 and Scala 2.13 if there are more than one same cost candidate plans.
   
   In this pr give a way to make the optimization result deterministic as much as possible with same input in Scala 2.12 and Scala 2.13, the main change of this pr as follow:
   
   - Change to use `LinkedHashMap` instead of `Map` to store `foundPlans` in `JoinReorderDP.search` method to ensure same iteration order with same insert order because iteration order of `Map` behave differently under Scala 2.12 and 2.13
   
   - Fixed `StarJoinCostBasedReorderSuite` affected by the above change
   
   - Regenerate golden files affected by the above change.
   
   ### Why are the changes needed?
   We need to support a Scala 2.13 build.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Scala 2.12: Pass the Jenkins or GitHub Action
   
   - Scala 2.13: All tests passed.
   
   Do the following:
   
   ```
   dev/change-scala-version.sh 2.13
   mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
   mvn test -pl sql/core -Pscala-2.13
   ```
   
   **Before**
   ```
   Tests: succeeded 8485, failed 13, canceled 1, ignored 52, pending 0
   *** 13 TESTS FAILED ***
   
   ```
   
   **After**
   
   ```
   Tests: succeeded 8498, failed 0, canceled 1, ignored 52, pending 0
   All tests passed.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690471480


   The test case named "Test 4: Star with several branches" in StarJoinCostBasedReorderSuite is a typical case.
   
   If the input is 
   ```
   d1.join(t3).join(t4).join(f1).join(d2).join(t5).join(t6).join(d3).join(t1).join(t2)
     .where((nameToAttr("d1_c2") === nameToAttr("t3_c1")) &&
       (nameToAttr("t3_c2") === nameToAttr("t4_c2")) &&
       (nameToAttr("d1_pk") === nameToAttr("f1_fk1")) &&
       (nameToAttr("f1_fk2") === nameToAttr("d2_pk")) &&
       (nameToAttr("d2_c2") === nameToAttr("t5_c1")) &&
       (nameToAttr("t5_c2") === nameToAttr("t6_c2")) &&
       (nameToAttr("f1_fk3") === nameToAttr("d3_pk")) &&
       (nameToAttr("d3_c2") === nameToAttr("t1_c1")) &&
       (nameToAttr("t1_c2") === nameToAttr("t2_c2")))
   ```
   
   the optimization result  in Scala 2.12 is 
   
   ```
     f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
       .join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
       .join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
       .
       .
       .
   ```
   
   and the optimization result  in Scala 2.13 is 
   
   ```
   f1.join(d3, Inner, Some(nameToAttr("f1_fk3") === nameToAttr("d3_pk")))
       .join(d2, Inner, Some(nameToAttr("f1_fk2") === nameToAttr("d2_pk")))
       .join(d1, Inner, Some(nameToAttr("f1_fk1") === nameToAttr("d1_pk")))
       .
       .
       .
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690768625


   **[Test build #128533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128533/testReport)** for PR 29718 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690769240






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690546632


   **[Test build #128533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128533/testReport)** for PR 29718 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang closed pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang closed pull request #29718:
URL: https://github.com/apache/spark/pull/29718


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690769240






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690445060






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690445060






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29718: [SPARK-32848][SQL] Let CostBasedJoinReorder produce same result with same input in Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29718:
URL: https://github.com/apache/spark/pull/29718#issuecomment-690546632


   **[Test build #128533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128533/testReport)** for PR 29718 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org