You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/10 10:01:02 UTC

[GitHub] [spark] LuciferYang opened a new pull request #29711: [SPARK-32808][SQL] Let CostBasedJoinReorder produce deterministic result with Scala 2.12 and 2.13

LuciferYang opened a new pull request #29711:
URL: https://github.com/apache/spark/pull/29711


   ### What changes were proposed in this pull request?
   
   The optimization result of CostBasedJoinReorder maybe produce different result with same input with Scala 2.12 and Scala 2.13 if there are more than one same cost candidate plans.
   
   In this pr give a way to make it deterministic as much as possible, the main change of this pr as follow:
   
   - Iteration order of `Map` behave differently under Scala 2.12 and 2.13 with same inserted order, so in this pr change to use `LinkedHashMap` instead of `Map` to store `JoinPlanMap` in `JoinReorderDP.search` method to ensure this.
   
   - Fixed `StarJoinCostBasedReorderSuite` affected by the above change
   
   - Regenerate golden files affected by the above change.
   
   ### Why are the changes needed?
   We need to support a Scala 2.13 build.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   - Scala 2.12: Pass the Jenkins or GitHub Action
   
   - Scala 2.13: All tests passed.
   
   Do the following:
   
   ```
   dev/change-scala-version.sh 2.13
   mvn clean install -DskipTests  -pl sql/core -Pscala-2.13 -am
   mvn test -pl sql/core -Pscala-2.13
   ```
   https://github.com/apache/spark/pull/29689 fix some cases affected by SPARK-32755
   
   **Before**
   ```
   Tests: succeeded 8485, failed 13, canceled 1, ignored 52, pending 0
   *** 13 TESTS FAILED ***
   
   ```
   
   **After**
   
   ```
   Tests: succeeded 8498, failed 0, canceled 1, ignored 52, pending 0
   All tests passed.
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690342776






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-694529528


   I'll merge this tomorrow if there are no more objections.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486346141



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -150,9 +150,16 @@ object JoinReorderDP extends PredicateHelper with Logging {
     // Level i maintains all found plans for i + 1 items.
     // Create the initial plans: each plan is a single item with zero cost.
     val itemIndex = items.zipWithIndex
-    val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-      case (item, id) => Set(id) -> JoinPlan(Set(id), item, ExpressionSet(), Cost(0, 0))
-    }.toMap)
+    val foundPlans = mutable.Buffer[JoinPlanMap]({
+      val idToJoinPlanSeq = itemIndex.map {
+        case (item, id) => Set(id) -> JoinPlan(Set(id), item, ExpressionSet(), Cost(0, 0))
+      }
+      // SPARK-32687: Change to use `LinkedHashMap` to make sure that items are
+      // inserted and iterated in the same order.
+      val ret = new mutable.LinkedHashMap[Set[Int], JoinPlan]()
+      idToJoinPlanSeq.foreach(v => ret.put(v._1, v._2))

Review comment:
       I think you can just construct this above, with a .foreach instead of .map? or just call .addAll on the result of the .map.
   I guess LinkedHashMap takes more memory, but that shouldn't matter much here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690762563






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486440104



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       @gatorsmile Use a separate JIRA number? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690130925


   **[Test build #128507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128507/testReport)** for PR 29711 at commit [`8a0bb43`](https://github.com/apache/spark/commit/8a0bb4321936b3b9aa54209cb0babfa5840d821b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486433621



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       > Change to use LinkedHashMap instead of Map to store foundPlans in JoinReorderDP.search method to ensure same iteration order with same insert order because iteration order of Map behave differently under Scala 2.12 and 2.13
   
   Can we make it as a separate PR? The plan changes need more reviews




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690449558


   @gatorsmile @srowen @cloud-fan already make this as a separate PR SPARK-32848 and I will close this pr first


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29711: [SPARK-32808][SQL] Let CostBasedJoinReorder produce deterministic result with Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690127761






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #29711: [SPARK-32808][SQL] Let CostBasedJoinReorder produce deterministic result with Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690128026


   cc @srowen , this patch fixed remaining failed cases of sql core module with Scala 2.13 and after this patch all test passed now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690419110


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128507/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690761497


   **[Test build #128530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128530/testReport)** for PR 29711 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-695869435


   thx~ @srowen @cloud-fan @gatorsmile 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r487927107



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt
##########
@@ -13,12 +13,16 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact]
                         InputAdapter
                           Exchange [ss_customer_sk] #2
                             WholeStageCodegen (4)
-                              Project [i_brand_id,i_brand,i_manufact_id,i_manufact,ss_customer_sk,ss_ext_sales_price,s_zip]
+                              Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]

Review comment:
       Would this kind of thing possibly affect the order of columns from a `select *` or is that accounted for elsewhere?

##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q7.sf100/explain.txt
##########
@@ -10,15 +10,15 @@ TakeOrderedAndProject (34)
                :     :- * Project (17)
                :     :  +- * BroadcastHashJoin Inner BuildRight (16)
                :     :     :- * Project (10)
-               :     :     :  +- * BroadcastHashJoin Inner BuildLeft (9)
-               :     :     :     :- BroadcastExchange (5)
-               :     :     :     :  +- * Project (4)
-               :     :     :     :     +- * Filter (3)
-               :     :     :     :        +- * ColumnarToRow (2)
-               :     :     :     :           +- Scan parquet default.date_dim (1)
-               :     :     :     +- * Filter (8)
-               :     :     :        +- * ColumnarToRow (7)
-               :     :     :           +- Scan parquet default.store_sales (6)
+               :     :     :  +- * BroadcastHashJoin Inner BuildRight (9)

Review comment:
       I'm skimming the changes, and this seems non-trivial? but then again I am not so sure how to read these plans expertly enough to evaluate. The plan below starts by scanning a different table, for example.
   
   We may just have to accept changes like this, if they're equivalent, to make sure they do not depend on implementation details of hash maps. But @gatorsmile @cloud-fan et al do these look like plausible equivalent plans for example?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690508779


   **[Test build #128530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128530/testReport)** for PR 29711 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690127761






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #29711:
URL: https://github.com/apache/spark/pull/29711


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486408783



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       I think this is OK, though it's not strictly necessary to make the type def this specific




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r488543333



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt
##########
@@ -13,12 +13,16 @@ TakeOrderedAndProject [ext_price,brand,brand_id,i_manufact_id,i_manufact]
                         InputAdapter
                           Exchange [ss_customer_sk] #2
                             WholeStageCodegen (4)
-                              Project [i_brand_id,i_brand,i_manufact_id,i_manufact,ss_customer_sk,ss_ext_sales_price,s_zip]
+                              Project [ss_customer_sk,ss_ext_sales_price,i_brand_id,i_brand,i_manufact_id,i_manufact,s_zip]

Review comment:
       https://github.com/apache/spark/blob/99384d1e831b7fe82a3a80ade1da976971624ee7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala#L298-L308
   
   Project node add by above code part in `JoinReorderDP#buildJoin` method, I think the Project output order decided by join order




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690419062






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29711: [SPARK-32808][SQL] Let CostBasedJoinReorder produce deterministic result with Scala 2.12 and 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690128026


   cc @srowen , this patch fixed remaining failed cases of sql core module with Scala 2.13 and all test passed now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486405262



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -150,9 +150,16 @@ object JoinReorderDP extends PredicateHelper with Logging {
     // Level i maintains all found plans for i + 1 items.
     // Create the initial plans: each plan is a single item with zero cost.
     val itemIndex = items.zipWithIndex
-    val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-      case (item, id) => Set(id) -> JoinPlan(Set(id), item, ExpressionSet(), Cost(0, 0))
-    }.toMap)
+    val foundPlans = mutable.Buffer[JoinPlanMap]({
+      val idToJoinPlanSeq = itemIndex.map {
+        case (item, id) => Set(id) -> JoinPlan(Set(id), item, ExpressionSet(), Cost(0, 0))
+      }
+      // SPARK-32687: Change to use `LinkedHashMap` to make sure that items are
+      // inserted and iterated in the same order.
+      val ret = new mutable.LinkedHashMap[Set[Int], JoinPlan]()
+      idToJoinPlanSeq.foreach(v => ret.put(v._1, v._2))

Review comment:
       Got it ~ Address 10d4953 fix this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-695869435


   thx~ @srowen @cloud-fan @gatorsmile 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r486436990



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       Do you mean to use `Map[Set[Int], JoinPlan]` or not to define `JoinPlanMap` type? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690130925


   **[Test build #128507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128507/testReport)** for PR 29711 at commit [`8a0bb43`](https://github.com/apache/spark/commit/8a0bb4321936b3b9aa54209cb0babfa5840d821b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690419062


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690449558


   @gatorsmile @srowen @cloud-fan already make this as a separate PR SPARK-32848, https://github.com/apache/spark/pull/29717


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690508779


   **[Test build #128530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128530/testReport)** for PR 29711 at commit [`10d4953`](https://github.com/apache/spark/commit/10d4953e7ed70644e93f648dc3206a4d255c2d54).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang closed pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang closed pull request #29711:
URL: https://github.com/apache/spark/pull/29711


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r487615921



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       > Same JIRA I think, just a separate PR. I'm not sure it matters that much though, it's a tiny part of the change really? we just need more eyes on the plan changes.
   
   It seems that there is no way to divide 2 PR because if we change `CostBasedJoinReorder `  only,  the test cases in `sql/core` module will be failed.
   
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r487617202



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       So all excepted plan changes are due to the use of `LinkedHashMap` instead of `Map`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r487617202



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       So all excepted plan changes are due to `use of LinkedHashMap instead of Map`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29711:
URL: https://github.com/apache/spark/pull/29711#discussion_r487529102



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
##########
@@ -316,7 +323,7 @@ object JoinReorderDP extends PredicateHelper with Logging {
   }
 
   /** Map[set of item ids, join plan for these items] */
-  type JoinPlanMap = Map[Set[Int], JoinPlan]
+  type JoinPlanMap = mutable.LinkedHashMap[Set[Int], JoinPlan]

Review comment:
       Same JIRA I think, just a separate PR. I'm not sure it matters that much though, it's a tiny part of the change really? we just need more eyes on the plan changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
srowen commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-694939769


   Merged to master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690449558


   @gatorsmile @srowen @cloud-fan already make this as a separate PR SPARK-32848


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-693765204


   @srowen @gatorsmile Is there any other problem in this pr that needs to be fixed? It seems that @cloud-fan  thinks the change is safe.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690414594


   **[Test build #128507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128507/testReport)** for PR 29711 at commit [`8a0bb43`](https://github.com/apache/spark/commit/8a0bb4321936b3b9aa54209cb0babfa5840d821b).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690342776






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29711: [SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29711:
URL: https://github.com/apache/spark/pull/29711#issuecomment-690762563






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org