You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2016/01/31 03:00:56 UTC

[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/10997

    [SPARK-13105] Reject NATURAL JOIN queries rather than returning wrong answers

    In Spark 1.6 and earlier, Spark SQL does not support `NATURAL JOIN` queries. However, its SQL parser does not consider `NATURAL` to be a reserved word, which causes natural joins to be parsed as regular joins where the left table has been aliased. For instance,
    
    ```
    SELECT * FROM foo NATURAL JOIN bar
    ```
    
    gets interpreted as `foo JOIN bar` where `foo` is aliased to `natural`.
    
    Rather than doing this, which leads to confusing / wrong results for users who expect NATURAL JOIN behavior, Spark should immediately reject these queries at analysis time and should provide an informative error message.
    
    As a result, this patch is targeted at Spark 1.6 and earlier.
    
    I chose to implement this check entirely within the parser in order to minimize the scope of the changes and to not introduce any new classes into the logical plan layer. I considered introducing a new `NaturalJoin` join type, parsing the query, then detecting and throwing an error from the analyzer but ended up rejecting this approach because I was concerned that adding a new class to a sealed trait would break compilation for third-party code which pattern-matches on Catalyst join types.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-13105

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10997
    
----
commit fc15a91de0e163c0ea5b7f7ca031aa62317ed301
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-01-31T01:57:36Z

    Add regression test for SPARK-13105

commit 21e42d893283a754f3d25758562864ceb57043d4
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-01-31T01:57:44Z

    Fix SPARK-13105 at the parser level.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177358778
  
    **[Test build #50453 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50453/consoleFull)** for PR 10997 at commit [`21e42d8`](https://github.com/apache/spark/commit/21e42d893283a754f3d25758562864ceb57043d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177376511
  
    **[Test build #50453 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50453/consoleFull)** for PR 10997 at commit [`21e42d8`](https://github.com/apache/spark/commit/21e42d893283a754f3d25758562864ceb57043d4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177419308
  
    hm i thought about this more -- i actually think it is too risky to change the behavior here for 1.6.x. For example, there might be users whose queries are depending on the fact that natural is not a reserved keyword.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen closed the pull request at:

    https://github.com/apache/spark/pull/10997


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177376532
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177604554
  
    Argh, that's annoying. I guess I'll go ahead and close this PR and resolve the JIRA issue as "Not a Problem".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177357659
  
    /cc @cloud-fan for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177376534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50453/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177381245
  
    how about hive context? Should we update `HiveQl.scala` too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13105] Reject NATURAL JOIN queries rath...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10997#issuecomment-177442161
  
    Hive does not have a ```NATURAL``` keyword. It behaves the same as the old SqlParser: 
    
    - It will either treat ```NATURAL``` the alias of the previous table, eg: ```select * from db.a natural join db.b```
    - It does not recognize the input at ```NATURAL``` and throws a ```ParseException```, eg: ```select * from db.a as x natural join db.b```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org