You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wzhfy <gi...@git.apache.org> on 2017/03/20 03:23:26 UTC

[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/17353

    [SPARK-17080][SQL][FOLLOWUP] Improve documentation and naming for methods/variables

    ## What changes were proposed in this pull request?
    
    Improve documentation and naming for methods/variables.
    
    ## How was this patch tested?
    
    Not related.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark reorderFollow

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17353
    
----
commit 7598eb81b782f71d9da3794a6a007c63fdf42b23
Author: wangzhenhua <wa...@huawei.com>
Date:   2017-03-20T03:18:31Z

    fxi comments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74963/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74947/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74978/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Could we move the checking `if (oneSidePlan.itemIds.intersect(otherSidePlan.itemIds).isEmpty)` into the `buildJoin`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106841610
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper {
       }
     
       /**
    -   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * Builds a new JoinPlan if there exists at least one join condition involving references from
    +   * both left and right.
        * @param oneJoinPlan One side JoinPlan for building a new JoinPlan.
        * @param otherJoinPlan The other side JoinPlan for building a new join node.
    --- End diff --
    
    Do you want to rename them to leftPlan and rightPlan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74977/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74977/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Yeah. The counts can help us understand the pruning rate of the search space. When CBO join reordering is very slow, we can check the counts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106841464
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper {
       }
     
       /**
    -   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * Builds a new JoinPlan if there exists at least one join condition involving references from
    +   * both left and right.
    --- End diff --
    
    ```
    Builds a new JoinPlan when both conditions hold
    - the sets of items contained in both left and right sides do not overlap 
    - there exists at least one join condition involving references from both sides
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106841523
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper {
       }
     
       /**
    -   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * Builds a new JoinPlan if there exists at least one join condition involving references from
    +   * both left and right.
        * @param oneJoinPlan One side JoinPlan for building a new JoinPlan.
        * @param otherJoinPlan The other side JoinPlan for building a new join node.
        * @param conf SQLConf for statistics computation.
        * @param conditions The overall set of join conditions.
        * @param topOutput The output attributes of the final plan.
    -   * @return Return a new JoinPlan if the two sides can be joined with some conditions. Otherwise,
    -   *         return None.
    +   * @return Builds and returns a new JoinPlan if there exists at least one join condition
    +   *         involving references from both left and right. Otherwise, returns None.
    --- End diff --
    
    Now, we can simplify it to `Builds and returns a new JoinPlan if both conditions hold. Otherwise, returns None.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106843609
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper {
       }
     
       /**
    -   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * Builds a new JoinPlan if there exists at least one join condition involving references from
    +   * both left and right.
        * @param oneJoinPlan One side JoinPlan for building a new JoinPlan.
        * @param otherJoinPlan The other side JoinPlan for building a new join node.
    --- End diff --
    
    left and right sides are decided inside this method. It tends to build a left deep tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107080002
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -152,6 +156,10 @@ object JoinReorderDP extends PredicateHelper {
           foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
         }
     
    +//    val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
    +//    logDebug(s"Join reordering finished. Duration: $durationInMs ms, number of items: " +
    +//      s"${items.length}, number of plans in memo: ${foundPlans.map(_.size).sum}")
    --- End diff --
    
    oops, I forgot to recover them...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74980/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74947/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106835910
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -201,7 +201,16 @@ object JoinReorderDP extends PredicateHelper {
         nextLevel.toMap
       }
     
    -  /** Build a new join node. */
    +  /**
    +   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    --- End diff --
    
    Let us reword it too. 
    
    > Builds a new JoinPlan if there is a join predicate connecting two given sides.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74934/testReport)** for PR 17353 at commit [`2a9ba46`](https://github.com/apache/spark/commit/2a9ba4651a0d46a27e494bf9dda4c444cf1f2831).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74963/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74845/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106835632
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -201,7 +201,16 @@ object JoinReorderDP extends PredicateHelper {
         nextLevel.toMap
       }
     
    -  /** Build a new join node. */
    +  /**
    +   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * @param oneJoinPlan One side JoinPlan for building a new JoinPlan.
    +   * @param otherJoinPlan The other side JoinPlan for building a new join node.
    +   * @param conf SQLConf for statistics computation.
    +   * @param conditions The overall set of join conditions.
    +   * @param topOutput The output attributes of the final plan.
    +   * @return Return a new JoinPlan if the two sides can be joined with some conditions. Otherwise,
    --- End diff --
    
    How about? 
    > Returns a new JoinPlan if there exists at least one join condition involving references from both left and right. Otherwise, returns None.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74963/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74934/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74978/testReport)** for PR 17353 at commit [`af511b2`](https://github.com/apache/spark/commit/af511b281756da8a40af876a7c902bdf3388feb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107138307
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if there exists no join
    + * condition involving references from both left and right. This pruning strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will be as follows:
    --- End diff --
    
    Do you mean join conditions are not specified? How about `A J B J C J D where A.k1 = B.k1 and B.k2 = C.k2 and C.k3 = D.k3` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74934/testReport)** for PR 17353 at commit [`2a9ba46`](https://github.com/apache/spark/commit/2a9ba4651a0d46a27e494bf9dda4c444cf1f2831).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Where should we check the count? Whom do we want to expose it to?
    How about a debug level log?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Could you also update the description of `object JoinReorderDP` based on the recent update in https://github.com/apache/spark/pull/17286?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107133614
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if there exists no join
    + * condition involving references from both left and right. This pruning strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will be as follows:
      * level 0: p({A}), p({B}), p({C}), p({D})
    - * level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D})
    - * level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
    + * level 1: p({A, B}), p({B, C}), p({C, D})
    + * level 2: p({A, B, C}), p({B, C, D})
      * level 3: p({A, B, C, D})
      * where p({A, B, C, D}) is the final output plan.
      *
      * For cost evaluation, since physical costs for operators are not available currently, we use
      * cardinalities and sizes to compute costs.
      */
    -object JoinReorderDP extends PredicateHelper {
    +object JoinReorderDP extends PredicateHelper with Logging {
     
       def search(
           conf: SQLConf,
           items: Seq[LogicalPlan],
           conditions: Set[Expression],
           topOutput: AttributeSet): LogicalPlan = {
     
    +    val startTime = System.nanoTime()
    --- End diff --
    
    use `System.currentTimeMillis` if we only care about the ms level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74845/testReport)** for PR 17353 at commit [`7598eb8`](https://github.com/apache/spark/commit/7598eb81b782f71d9da3794a6a007c63fdf42b23).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class Cost(card: BigInt, size: BigInt) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Please update PR description and title.
    
    LGTM pending Jenkins 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74864/testReport)** for PR 17353 at commit [`65b2b5b`](https://github.com/apache/spark/commit/65b2b5bb3bb73d582e935fdd1880cb64947c9f19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Replace the following codes by using `pattern match`
    ```
                if (joinPlan.isDefined) {
                  val newJoinPlan = joinPlan.get
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r106843645
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -202,14 +201,15 @@ object JoinReorderDP extends PredicateHelper {
       }
     
       /**
    -   * Builds a new JoinPlan if the two given sides can be joined with some conditions.
    +   * Builds a new JoinPlan if there exists at least one join condition involving references from
    +   * both left and right.
    --- End diff --
    
    Great! Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107137731
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if there exists no join
    + * condition involving references from both left and right. This pruning strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will be as follows:
      * level 0: p({A}), p({B}), p({C}), p({D})
    - * level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, D})
    - * level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
    + * level 1: p({A, B}), p({B, C}), p({C, D})
    + * level 2: p({A, B, C}), p({B, C, D})
      * level 3: p({A, B, C, D})
      * where p({A, B, C, D}) is the final output plan.
      *
      * For cost evaluation, since physical costs for operators are not available currently, we use
      * cardinalities and sizes to compute costs.
      */
    -object JoinReorderDP extends PredicateHelper {
    +object JoinReorderDP extends PredicateHelper with Logging {
     
       def search(
           conf: SQLConf,
           items: Seq[LogicalPlan],
           conditions: Set[Expression],
           topOutput: AttributeSet): LogicalPlan = {
     
    +    val startTime = System.nanoTime()
    --- End diff --
    
    `nanoTime()` is more reliable than `currentTimeMillis()`: https://github.com/databricks/scala-style-guide#misc_currentTimeMillis_vs_nanoTime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74980 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74980/testReport)** for PR 17353 at commit [`40af14c`](https://github.com/apache/spark/commit/40af14c423257686d9f5458561344eb50fd7c1e3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74978/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74980/testReport)** for PR 17353 at commit [`40af14c`](https://github.com/apache/spark/commit/40af14c423257686d9f5458561344eb50fd7c1e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107137438
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if there exists no join
    + * condition involving references from both left and right. This pruning strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will be as follows:
    --- End diff --
    
    why?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by wzhfy <gi...@git.apache.org>.
Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    retest this please...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74977/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74864/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74845/testReport)** for PR 17353 at commit [`7598eb8`](https://github.com/apache/spark/commit/7598eb81b782f71d9da3794a6a007c63fdf42b23).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    **[Test build #74945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74945/testReport)** for PR 17353 at commit [`71459c5`](https://github.com/apache/spark/commit/71459c5280da2360ba6c5769a88b5ea95f8a9a13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Does it make sense to introduce some counts to decide how many new JoinPlan we build?
    
    We can find out what we pruned in the search. It can easy for us to ensure no regression happened when we improve the codes in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74945/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107133345
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if there exists no join
    + * condition involving references from both left and right. This pruning strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will be as follows:
    --- End diff --
    
    `will be` -> `may be`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107077912
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala ---
    @@ -152,6 +156,10 @@ object JoinReorderDP extends PredicateHelper {
           foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
         }
     
    +//    val durationInMs = (System.nanoTime() - startTime) / (1000 * 1000)
    +//    logDebug(s"Join reordering finished. Duration: $durationInMs ms, number of items: " +
    +//      s"${items.length}, number of plans in memo: ${foundPlans.map(_.size).sum}")
    --- End diff --
    
    ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation, chan...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentation and n...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17353
  
    debug log SGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org