You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/07/26 09:09:48 UTC

[GitHub] spark pull request #14367: [SPARK-16733][SQL] use antlr to parse column name...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/14367

    [SPARK-16733][SQL] use antlr to parse column name instead of implementing it ourselves

    ## What changes were proposed in this pull request?
    
    It's weird to write code ourselves to parse column name, while we already have the antlr framework. This PR removes that special code and use antlr to parse column name instead.
    
    
    ## How was this patch tested?
    
    existing tests.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark column-name

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14367.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14367
    
----
commit a1c5b0d1a6a46c19e4af9191b51f8a47b6219187
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-07-26T09:06:52Z

    use antlr to parse column name instead of implementing it ourselves

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    So I think the proper fix is that in the future, we can introduce a config flag to specifies the parse mode for column names:
    
    1. Consistent with Spark 1.x and 2.0
    2. Consistent with SQL
    3. Treat as string literals.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    cc @hvanhovell @rxin , this is actually a breaking change, as previously we support `df.select("a $%  z. xy")`, which will find the column `a $%  z` and get its inner field ` xy`.
    
    Personally I think we should not support it and make the Dataset API more consistent with SQL API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62878/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    Hmmm... The rule used by the ANTLR parser only allows underscores and alphanumeric characters. We support more than that (names containing `$`, `-`, ...) in the old rule. We can support this by surrounding these names with backticks.
    
    I am wondering how much we need to fix to get this working.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    **[Test build #62878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62878/consoleFull)** for PR 14367 at commit [`a1c5b0d`](https://github.com/apache/spark/commit/a1c5b0d1a6a46c19e4af9191b51f8a47b6219187).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    **[Test build #62878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62878/consoleFull)** for PR 14367 at commit [`a1c5b0d`](https://github.com/apache/spark/commit/a1c5b0d1a6a46c19e4af9191b51f8a47b6219187).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    @cloud-fan can you update the pull request description to explain the behavior change? It is not clear to me without looking at the code what the change is.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14367: [SPARK-16733][SQL] use antlr to parse column name instea...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14367
  
    closing this PR, and I'll work on it again when we have a plan for the column name parsing mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14367: [SPARK-16733][SQL] use antlr to parse column name...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan closed the pull request at:

    https://github.com/apache/spark/pull/14367


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org