You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/02/01 06:14:29 UTC

[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/16762

    [SPARK-19419] [SPARK-19420] Fix the cross join detection

    ### What changes were proposed in this pull request?
    There are two issues in the existing detection of cartesian products. 
    
    1) When users use the outer joins where both sides of the tables are unable to be broadcasted, Spark will still select `BroadcastNestedLoopJoin`. CROSS JOIN syntax is unable to cover the scenario of outer join, but we still issue the following error message:
    ```
    Use the CROSS JOIN syntax to allow cartesian products between these relations
    ```
    
    2) The existing detection is unable to cover all the cartesian product cases. For example, 
      - Case 1) having non-equal predicates in join conditiions of an inner join.
      - Case 2) equi-join's key columns are not sortable and both sides are not small enough for broadcasting.
    
    This PR is to move the cross-join detection back to `BroadcastNestedLoopJoinExec` and `CartesianProductExec`. 
    
    ### How was this patch tested?
    Added the extra test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark crossJoin

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16762.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16762
    
----
commit e4e3c9b84993ca415a1d07bcb07f20393c6fd5b4
Author: gatorsmile <ga...@gmail.com>
Date:   2017-02-01T05:47:04Z

    fix.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99893570
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!sqlContext.conf.crossJoinEnabled) {
    +      joinType match {
    +        case Cross => // Do nothing
    +        case Inner =>
    +          if (condition.isEmpty) {
    +            throw new AnalysisException(
    +              s"""Detected cartesian product for INNER join between logical plans
    +                 |${left.treeString(false).trim}
    +                 |and
    +                 |${right.treeString(false).trim}
    +                 |Join condition is missing or trivial.
    +                 |Use the CROSS JOIN syntax to allow cartesian products between these relations.
    +               """.stripMargin)
    +          }
    +        case _ if !withinBroadcastThreshold || condition.isEmpty =>
    +          throw new AnalysisException(
    +            s"""Both sides of this join are outside the broadcasting threshold and
    +               |computing it could be prohibitively expensive. To explicitly enable it,
    +               |Please set ${SQLConf.CROSS_JOINS_ENABLED.key} = true.
    --- End diff --
    
    : ) This is also my question. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72421/testReport)** for PR 16762 at commit [`d02baee`](https://github.com/apache/spark/commit/d02baee13bd91b4ad3aace8946d19e08abad7a24).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99975571
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!sqlContext.conf.crossJoinEnabled) {
    +      joinType match {
    +        case Cross => // Do nothing
    +        case Inner =>
    +          if (condition.isEmpty) {
    +            throw new AnalysisException(
    +              s"""Detected cartesian product for INNER join between logical plans
    +                 |${left.treeString(false).trim}
    +                 |and
    +                 |${right.treeString(false).trim}
    +                 |Join condition is missing or trivial.
    +                 |Use the CROSS JOIN syntax to allow cartesian products between these relations.
    +               """.stripMargin)
    +          }
    +        case _ if !withinBroadcastThreshold || condition.isEmpty =>
    +          throw new AnalysisException(
    +            s"""Both sides of this join are outside the broadcasting threshold and
    +               |computing it could be prohibitively expensive. To explicitly enable it,
    +               |Please set ${SQLConf.CROSS_JOINS_ENABLED.key} = true.
    --- End diff --
    
    cc @sameeragarwal @srinathshankar 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/16762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73921/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72421/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r98834480
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/cross-join.sql ---
    @@ -33,3 +33,5 @@ create temporary view D(d, vd) as select * from nt1;
     -- Allowed since cross join with C is explicit
     select * from ((A join B on (a = b)) cross join C) join D on (a = d);
     
    +-- Cross joins with non-equal predicates
    +SELECT * FROM nt1 CROSS JOIN nt2 ON (nt1.k > nt2.k);
    --- End diff --
    
    So far, the SQL syntax allows users to specify the join condition. 
    
    The Dataset API does not allow users to do it, but users still can do it by using the filter, which will be pushed into the cross join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72399/testReport)** for PR 16762 at commit [`88a385c`](https://github.com/apache/spark/commit/88a385c31aaac5fcf514c2dd058c3f1628003c5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72289/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72399/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72289/testReport)** for PR 16762 at commit [`671a361`](https://github.com/apache/spark/commit/671a361978b621b542410f57bbcecd11e9499afc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    cc @srinathshankar @sameeragarwal @hvanhovell @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #73922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73922/testReport)** for PR 16762 at commit [`98cd60c`](https://github.com/apache/spark/commit/98cd60cb627274d18f8198e99d724470676c399e).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72235/testReport)** for PR 16762 at commit [`e4e3c9b`](https://github.com/apache/spark/commit/e4e3c9b84993ca415a1d07bcb07f20393c6fd5b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73922/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72289/testReport)** for PR 16762 at commit [`671a361`](https://github.com/apache/spark/commit/671a361978b621b542410f57bbcecd11e9499afc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72421/testReport)** for PR 16762 at commit [`d02baee`](https://github.com/apache/spark/commit/d02baee13bd91b4ad3aace8946d19e08abad7a24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99217621
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,18 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!withinBroadcastThreshold && !sqlContext.conf.crossJoinEnabled) {
    +      throw new AnalysisException(
    +        s"""
    +           |Both sides of this join are outside the broadcasting threshold and
    --- End diff --
    
    I am not sure whether increasing `broadcast threshold ` is a good option to users. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #73922 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73922/testReport)** for PR 16762 at commit [`98cd60c`](https://github.com/apache/spark/commit/98cd60cb627274d18f8198e99d724470676c399e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99178232
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,18 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!withinBroadcastThreshold && !sqlContext.conf.crossJoinEnabled) {
    +      throw new AnalysisException(
    +        s"""
    +           |Both sides of this join are outside the broadcasting threshold and
    --- End diff --
    
    should it
    - say a cross join is being considered (if it is)
    - reference the broadcast threshold to allow the user to increase it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72399/testReport)** for PR 16762 at commit [`88a385c`](https://github.com/apache/spark/commit/88a385c31aaac5fcf514c2dd058c3f1628003c5b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72405/testReport)** for PR 16762 at commit [`d02baee`](https://github.com/apache/spark/commit/d02baee13bd91b4ad3aace8946d19e08abad7a24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99893611
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!sqlContext.conf.crossJoinEnabled) {
    +      joinType match {
    +        case Cross => // Do nothing
    +        case Inner =>
    --- End diff --
    
    Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99894093
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -213,7 +213,12 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
               }
             // This join could be very slow or OOM
             joins.BroadcastNestedLoopJoinExec(
    --- End diff --
    
    That depends on how we defined it. See the original PR: https://github.com/apache/spark/pull/13209


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r98833830
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -584,24 +602,37 @@ class JoinSuite extends QueryTest with SharedSQLContext {
         val cartesianQueries = Seq(
           /** The following should error out since there is no explicit cross join */
           "SELECT * FROM testData inner join testData2",
    -      "SELECT * FROM testData left outer join testData2",
    -      "SELECT * FROM testData right outer join testData2",
    -      "SELECT * FROM testData full outer join testData2",
    --- End diff --
    
    These three queries are examples for outer joins that are unable to be replaced by cross join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Cross join is a physical concept. In Spark 2.0, we detected it like what this PR did. In Spark 2.1, we moved it to Optimizer. Basically, this PR is to change it back. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99748730
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!sqlContext.conf.crossJoinEnabled) {
    +      joinType match {
    +        case Cross => // Do nothing
    +        case Inner =>
    +          if (condition.isEmpty) {
    +            throw new AnalysisException(
    +              s"""Detected cartesian product for INNER join between logical plans
    +                 |${left.treeString(false).trim}
    +                 |and
    +                 |${right.treeString(false).trim}
    +                 |Join condition is missing or trivial.
    +                 |Use the CROSS JOIN syntax to allow cartesian products between these relations.
    +               """.stripMargin)
    +          }
    +        case _ if !withinBroadcastThreshold || condition.isEmpty =>
    +          throw new AnalysisException(
    +            s"""Both sides of this join are outside the broadcasting threshold and
    +               |computing it could be prohibitively expensive. To explicitly enable it,
    +               |Please set ${SQLConf.CROSS_JOINS_ENABLED.key} = true.
    --- End diff --
    
    shall we support `CROSS OUTER JOIN`, `CROSS LEFT JOIN`, etc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99748912
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala ---
    @@ -339,6 +340,33 @@ case class BroadcastNestedLoopJoinExec(
         )
       }
     
    +  protected override def doPrepare(): Unit = {
    +    if (!sqlContext.conf.crossJoinEnabled) {
    +      joinType match {
    +        case Cross => // Do nothing
    +        case Inner =>
    --- End diff --
    
    can you add comments to say when we will hit this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Let me first push the new changes to resolve the SparkR issue. However, I still need time to add the corresponding test cases to Scala. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    is CROSS JOIN a logical or physical concept?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72235/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #73921 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73921/testReport)** for PR 16762 at commit [`efdf04e`](https://github.com/apache/spark/commit/efdf04ee00c68c4914dd52e8262bda8dfef476da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72395/testReport)** for PR 16762 at commit [`88a385c`](https://github.com/apache/spark/commit/88a385c31aaac5fcf514c2dd058c3f1628003c5b).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72395/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99474050
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -215,18 +215,61 @@ class JoinSuite extends QueryTest with SharedSQLContext {
               Row(1, null, 2, 2) ::
               Row(2, 2, 1, null) ::
               Row(2, 2, 2, 2) :: Nil)
    +
    +      checkAnswer(
    +        testData3.as("x").join(testData3.as("y"), $"x.a" > $"y.a"),
    +        Row(2, 2, 1, null) :: Nil)
    +    }
    +    withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") {
    +      val msg = "Detected cartesian product for INNER join between logical plans"
    +      var e = intercept[AnalysisException](testData3.join(testData3).collect())
    +      assert(e.getMessage.contains(msg))
    +      e = intercept[AnalysisException] {
    +        testData3.as("x").join(testData3.as("y"), $"x.a" > $"y.a").collect()
    +      }
    +      assert(e.getMessage.contains(msg))
    +    }
    +  }
    +
    +  test("subquery converted to cartesian product join") {
    --- End diff --
    
    This is another typical example for cartesian product, which is converted from subquery.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r104320606
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -584,24 +602,39 @@ class JoinSuite extends QueryTest with SharedSQLContext {
         val cartesianQueries = Seq(
           /** The following should error out since there is no explicit cross join */
           "SELECT * FROM testData inner join testData2",
    -      "SELECT * FROM testData left outer join testData2",
    -      "SELECT * FROM testData right outer join testData2",
    -      "SELECT * FROM testData full outer join testData2",
    +
           "SELECT * FROM testData, testData2",
           "SELECT * FROM testData, testData2 where testData.key = 1 and testData2.a = 22",
    +      "SELECT * FROM testData, testData2 where testData.key > testData2.a",
    +      "SELECT * FROM testData INNER JOIN testData2 ON testData.key + testData2.a = testData2.b",
           /** The following should fail because after reordering there are cartesian products */
           "select * from (A join B on (A.key = B.key)) join D on (A.key=D.a) join C",
           "select * from ((A join B on (A.key = B.key)) join C) join D on (A.key = D.a)",
           /** Cartesian product involving C, which is not involved in a CROSS join */
    -      "select * from ((A join B on (A.key = B.key)) cross join D) join C on (A.key = D.a)");
    +      "select * from ((A join B on (A.key = B.key)) cross join D) join C on (A.key = D.a)")
    --- End diff --
    
    The estimated size of `testData`, `testData2`, and `testData3` are much larger than the broadcast threshold. Thus, it does not consider the cases when the tables are small enough to broadcast. Thus, when the tables are small enough to broadcast, should we still block users? In that case, the join is not expensive.
    
    It affects the whole detection logics rewriting. Thanks! @rxin @hvanhovell @cloud-fan @sameeragarwal  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72405/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r99749089
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -213,7 +213,12 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
               }
             // This join could be very slow or OOM
             joins.BroadcastNestedLoopJoinExec(
    --- End diff --
    
    when we broadcast, is it still a cross join?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r98833748
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -215,18 +215,36 @@ class JoinSuite extends QueryTest with SharedSQLContext {
               Row(1, null, 2, 2) ::
               Row(2, 2, 1, null) ::
               Row(2, 2, 2, 2) :: Nil)
    +
    +      checkAnswer(
    +        testData3.as("x").join(testData3.as("y"), $"x.a" > $"y.a"),
    --- End diff --
    
    This is a typical example of cartesian product, but our current detection is unable to find. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72235/testReport)** for PR 16762 at commit [`e4e3c9b`](https://github.com/apache/spark/commit/e4e3c9b84993ca415a1d07bcb07f20393c6fd5b4).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16762
  
    **[Test build #72395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72395/testReport)** for PR 16762 at commit [`88a385c`](https://github.com/apache/spark/commit/88a385c31aaac5fcf514c2dd058c3f1628003c5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16762: [SPARK-19419] [SPARK-19420] Fix the cross join de...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16762#discussion_r104306401
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
    @@ -584,24 +602,39 @@ class JoinSuite extends QueryTest with SharedSQLContext {
         val cartesianQueries = Seq(
           /** The following should error out since there is no explicit cross join */
           "SELECT * FROM testData inner join testData2",
    -      "SELECT * FROM testData left outer join testData2",
    -      "SELECT * FROM testData right outer join testData2",
    -      "SELECT * FROM testData full outer join testData2",
    +
           "SELECT * FROM testData, testData2",
           "SELECT * FROM testData, testData2 where testData.key = 1 and testData2.a = 22",
    +      "SELECT * FROM testData, testData2 where testData.key > testData2.a",
    +      "SELECT * FROM testData INNER JOIN testData2 ON testData.key + testData2.a = testData2.b",
           /** The following should fail because after reordering there are cartesian products */
           "select * from (A join B on (A.key = B.key)) join D on (A.key=D.a) join C",
           "select * from ((A join B on (A.key = B.key)) join C) join D on (A.key = D.a)",
           /** Cartesian product involving C, which is not involved in a CROSS join */
    -      "select * from ((A join B on (A.key = B.key)) cross join D) join C on (A.key = D.a)");
    +      "select * from ((A join B on (A.key = B.key)) cross join D) join C on (A.key = D.a)")
     
    -     def checkCartesianDetection(query: String): Unit = {
    -      val e = intercept[Exception] {
    -        checkAnswer(sql(query), Nil);
    -      }
    -      assert(e.getMessage.contains("Detected cartesian product"))
    +    def checkCartesianDetection(query: String): Unit = {
    +      val e = intercept[AnalysisException](sql(query).collect()).getMessage
    +      assert(e.contains("Detected cartesian product"))
         }
     
         cartesianQueries.foreach(checkCartesianDetection)
    +
    +    val broadcastNestedLoopJoinOutOfBroadcastThresholdQueries = Seq(
    +      /** The following should error out since the flag spark.sql.crossJoin.enabled is not on */
    +      "SELECT * FROM testData left outer join testData2",
    +      "SELECT * FROM testData right outer join testData2",
    +      "SELECT * FROM testData full outer join testData2",
    +      "SELECT * FROM testData full outer join testData2 on testData.key > testData2.a",
    +      "SELECT b FROM testData2 WHERE EXISTS(SELECT a FROM testData3)"
    --- End diff --
    
    This is a subquery that will use `BroadcastNestedLoopJoin`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org