You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by rerngvit <gi...@git.apache.org> on 2016/07/21 22:30:37 UTC

[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

GitHub user rerngvit opened a pull request:

    https://github.com/apache/spark/pull/14309

    [SPARK-11977][SQL] Support accessing a column contains "." without backticks

    ## What changes were proposed in this pull request?
    - Add support for accessing a dataframe column that contains "." in its name without backticks
    - Add a testcase for this in DataFrameSuite
    
    ## How was this patch tested?
    Spark SQL unit test and manual testing

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rerngvit/spark SPARK-11977

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14309
    
----
commit b5046851e46ceccf745677d2610e3c57f3e60f16
Author: Rerngvit Yanggratoke <re...@combient.com>
Date:   2016-07-21T22:26:28Z

    [SPARK-11977][SQL] Support accessing a DataFrame column using its name without backticks if the name contains '.'
    
    ## What changes were proposed in this pull request?
    - Add support for accessing dataframe column that contains "." in its name without backticks
    - Add a testcase for this in DataFrameSuite
    
    ## How was this patch tested?
    Spark unit tests and manual testing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r71986608
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row(key, value, key + 1)
           }.toSeq)
         assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
    --- End diff --
    
    @jaceklaskowski: I agree with you that the code can be made shorter and more elegant in the ways that you pointed out. Nonetheless, the improvements: (1) removing () in toDF() and (2) replacing the schema map with fieldNames, are actually applicable to other places in the file "DataFrameSuite.scala" as well. In other words, it is not directly related to this PR. I think the proper way to handle this is rather in another JIRA issue. Feel free to fire it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r71982269
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row(key, value, key + 1)
           }.toSeq)
         assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
    +
    +    // Renaming to a column that contains "." character
    +    val df2 = testData.toDF().withColumnRenamed("value", "value.Renamed")
    +    assert(df2.schema.map(_.name) === Seq("key", "value.Renamed"))
    --- End diff --
    
    Please add more test cases that columns whose name has '.' can be accessed without backticks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    @cloud-fan, could you help to review it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    I'm not sure I am good reviewer for this as I dont fully understand the consequences inside SQL for this change. cc @liancheng @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    **[Test build #62831 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62831/consoleFull)** for PR 14309 at commit [`0e47cfd`](https://github.com/apache/spark/commit/0e47cfd47774a8d75e9b45bbeb5a19fddb40494e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    **[Test build #62831 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62831/consoleFull)** for PR 14309 at commit [`0e47cfd`](https://github.com/apache/spark/commit/0e47cfd47774a8d75e9b45bbeb5a19fddb40494e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r73325142
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -896,6 +896,19 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         checkError(df("`a.b`.c.`d"))
       }
     
    +  test("SPARK-11977: Support accessing a column contains '.' without backticks") {
    +    val df = sparkContext.parallelize(
    +      (1, 2) :: (3, 4) ::  Nil).toDF("test.column1", "test.column.2")
    +    checkAnswer(df.select("test.column1"), Seq(Row(1), Row(3)))
    --- End diff --
    
    Any follow up on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r71888979
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row(key, value, key + 1)
           }.toSeq)
         assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
    --- End diff --
    
    I'd fix that line, too with `fieldNames`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit closed the pull request at:

    https://github.com/apache/spark/pull/14309


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    @sun-rui 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    I have not got any replies on this. I presume that this is a soft reject for this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r72127680
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -896,6 +896,19 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         checkError(df("`a.b`.c.`d"))
       }
     
    +  test("SPARK-11977: Support accessing a column contains '.' without backticks") {
    +    val df = sparkContext.parallelize(
    --- End diff --
    
    could you add some test cases with a DataFrame created from `iris`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by sun-rui <gi...@git.apache.org>.
Github user sun-rui commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Does this solution co-operate with the access pattern of "table.column"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r71888896
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row(key, value, key + 1)
           }.toSeq)
         assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
    +
    +    // Renaming to a column that contains "." character
    +    val df2 = testData.toDF().withColumnRenamed("value", "value.Renamed")
    --- End diff --
    
    No need for `()` in `toDF()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r71888837
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
             Row(key, value, key + 1)
           }.toSeq)
         assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
    +
    +    // Renaming to a column that contains "." character
    +    val df2 = testData.toDF().withColumnRenamed("value", "value.Renamed")
    +    assert(df2.schema.map(_.name) === Seq("key", "value.Renamed"))
    --- End diff --
    
    `df2.schema.fieldNames`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62831/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14309#discussion_r72180296
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -896,6 +896,19 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         checkError(df("`a.b`.c.`d"))
       }
     
    +  test("SPARK-11977: Support accessing a column contains '.' without backticks") {
    +    val df = sparkContext.parallelize(
    +      (1, 2) :: (3, 4) ::  Nil).toDF("test.column1", "test.column.2")
    +    checkAnswer(df.select("test.column1"), Seq(Row(1), Row(3)))
    --- End diff --
    
    This is a behaviour change, users may expect to see an analyse failure here. cc @rxin Do we want this feature?
    
    Personally I'd like to have a new function(similar to `col`, but resolves the column by the given name directly) instead of changing the semantic of `select`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by rerngvit <gi...@git.apache.org>.
Github user rerngvit commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    @sun-rui I added more testcases for this. Please have a look. It would be great, if you could enable the Jenkins CI testing. I don't have a permission to do so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the issue:

    https://github.com/apache/spark/pull/14309
  
    Jenkins, ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org