You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by aokolnychyi <gi...@git.apache.org> on 2017/07/26 20:13:24 UTC

[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

GitHub user aokolnychyi opened a pull request:

    https://github.com/apache/spark/pull/18740

    [SPARK-21538][SQL] Attribute resolution inconsistency in the Dataset API

    ## What changes were proposed in this pull request?
    
    This PR contains a tiny update that removes an attribute resolution inconsistency in the Dataset API. The following example is taken from the ticket description:
    
    ```
    spark.range(1).withColumnRenamed("id", "x").sort(col("id"))  // works
    spark.range(1).withColumnRenamed("id", "x").sort($"id")  // works
    spark.range(1).withColumnRenamed("id", "x").sort('id) // works
    spark.range(1).withColumnRenamed("id", "x").sort("id") // fails with:
    org.apache.spark.sql.AnalysisException: Cannot resolve column name "id" among (x);
    ```
    The above `AnalysisException` happens because the last case calls `Dataset.apply()` to convert strings into columns, which triggers attribute resolution. To make the API consistent between overloaded methods, this PR delays the resolution and constructs columns directly.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aokolnychyi/spark spark-21538

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18740.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18740
    
----
commit b0fbd5ee98de0ed725d96e63943da934150f6c3e
Author: aokolnychyi <an...@sap.com>
Date:   2017-07-26T19:56:44Z

    [SPARK-21538][SQL] Fix attribute resolution inconsistency in the Dataset API

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #80007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80007/testReport)** for PR 18740 at commit [`309cb8f`](https://github.com/apache/spark/commit/309cb8f09f5af81f230911574274c0ca9eb65f34).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #79973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79973/testReport)** for PR 18740 at commit [`b0fbd5e`](https://github.com/apache/spark/commit/b0fbd5ee98de0ed725d96e63943da934150f6c3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #79973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79973/testReport)** for PR 18740 at commit [`b0fbd5e`](https://github.com/apache/spark/commit/b0fbd5ee98de0ed725d96e63943da934150f6c3e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80009/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    cc @cloud-fan @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Please add a test like #18744 did. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79973/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18740#discussion_r129906375
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
           assert(rlike3.count() == 0)
         }
       }
    +
    +  test("SPARK-21538: Attribute resolution inconsistency in Dataset API") {
    +    val df = spark.range(1).withColumnRenamed("id", "x")
    +    checkAnswer(df.sort(col("id")), df.sort("id"))
    +    checkAnswer(df.sort($"id"), df.sort("id"))
    +    checkAnswer(df.sort('id), df.sort("id"))
    +    checkAnswer(df.orderBy('id), df.sort("id"))
    +    checkAnswer(df.orderBy("id"), df.sort("id"))
    --- End diff --
    
    Since it sounds like you are very interested in Spark open source contribution, let me explain how we write the test cases. Normally, we do not want to duplicate the codes and also want to verify the results. Thus, maybe you can change the code like
    
    ```Scala
        val df = spark.range(3).withColumnRenamed("id", "x")
        val expected = Row(0) :: Row(1) :: Row (2) :: Nil
        checkAnswer(df.sort("id"), expected)
        checkAnswer(df.sort(col("id")), expected)
        checkAnswer(df.sort($"id"), expected)
        checkAnswer(df.sort('id), expected)
        checkAnswer(df.orderBy('id), expected)
        checkAnswer(df.orderBy("id"), expected)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #80009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)** for PR 18740 at commit [`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #80009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)** for PR 18740 at commit [`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

Posted by aokolnychyi <gi...@git.apache.org>.
Github user aokolnychyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18740#discussion_r129911780
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala ---
    @@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
           assert(rlike3.count() == 0)
         }
       }
    +
    +  test("SPARK-21538: Attribute resolution inconsistency in Dataset API") {
    +    val df = spark.range(1).withColumnRenamed("id", "x")
    +    checkAnswer(df.sort(col("id")), df.sort("id"))
    +    checkAnswer(df.sort($"id"), df.sort("id"))
    +    checkAnswer(df.sort('id), df.sort("id"))
    +    checkAnswer(df.orderBy('id), df.sort("id"))
    +    checkAnswer(df.orderBy("id"), df.sort("id"))
    --- End diff --
    
    Indeed, looks much better. I appreciate the explanation and will take this into account in the future. I will update the test in a minute, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Thanks for fixing it! Please remember to add a test case whenever you fix anything else in the future. We really need to improve our test case coverage. BTW, welcome to contributing more test-only PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    LGTM except for test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18740


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80007/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    LGTM pending Jenkins 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18740
  
    **[Test build #80007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80007/testReport)** for PR 18740 at commit [`309cb8f`](https://github.com/apache/spark/commit/309cb8f09f5af81f230911574274c0ca9eb65f34).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org