You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2016/01/27 08:07:31 UTC

[GitHub] spark pull request: [SPARK-12988] Can't drop columns that contain ...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/10943

    [SPARK-12988] Can't drop columns that contain dots

    Neither of theses works:
    val df = Seq((1, 1)).toDF("a_b", "a.c")
    df.drop("a.c").collect()
    
    df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
    val df = Seq((1, 1)).toDF("a_b", "a.c")
    df.drop("`a.c`").collect()
    
    df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
    Given that you can't use drop to drop subfields, it seems to me that we should treat the column name literally (i.e. as though it is wrapped in back ticks)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark spark-12988

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10943.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10943
    
----
commit e7f30a40165b0e5c2cf86341bc5ce1b1079afe6e
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2016-01-27T07:05:39Z

    [SPARK-12988] Can't drop columns that contain dots

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175835435
  
    **[Test build #50215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50215/consoleFull)** for PR 10943 at commit [`e7f30a4`](https://github.com/apache/spark/commit/e7f30a40165b0e5c2cf86341bc5ce1b1079afe6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175917517
  
    @cloud-fan Thank you Wenchen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-187547692
  
    Sorry for the delay, we are discussing about this design choice, and will have an agreement this week or next week. Thanks for working on it and sorry for make you waiting :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175826781
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178310067
  
    **[Test build #50522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50522/consoleFull)** for PR 10943 at commit [`dfaa13b`](https://github.com/apache/spark/commit/dfaa13be459f49a616bad2b6180b19292ccccabc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175906396
  
    `select` is the API that supposed to take `column path`, but something like `withColumn`, `drop`, etc. is supposed to take `column name`. So what I suggest is: change `resolve` to `resolvePath` and add a new method `resolvedName` which abstract the columm name resolution logic from [`withColumn`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1178-L1180), then we can use `resolveName` in `drop` and other APIs that need a `column name` instead of `column path`.
    
    FYI, the PR that fixed a similar problem for `withColumn`: https://github.com/apache/spark/pull/10500


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178350378
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50525/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178761945
  
    @cloud-fan I have incorporated your suggestions except the comment about allowing sorrounding backticks in column name. Once we have a decision, i can remove it. Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-187549125
  
    @cloud-fan No issues. Thanks for your reply :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177815884
  
    @cloud-fan Thank you Wenchen. I will try your suggestion and get back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175899786
  
    @cloud-fan Thank you Wenchen for your comments. In my understanding , users need to use back-tick to quote the column names if they wanted them to be treated as a column name as opposed to column path. I tried the following example
    
    val df = Seq((1, 2, 3)).toDF("a_b", "a.c", "b.c")
    df.select("a.c") => fails to resolve
    df.select("`a.c`") => works fine.
    
    Is this not how it is supposed to work ? Can you please elaborate by taking a small
    example ? Thanks in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-217461632
  
    ping @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178299378
  
    @cloud-fan Thanks a lot. I have implemented as per your input.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178218207
  
    @cloud-fan Hi Wenchen, couldn't get the code snippet to compile and i made a change that looks like the following.
    
    def withColumn(colName: String, col: Column): DataFrame = {
        val output = queryExecution.analyzed.output
        indexOf(colName).map {index =>
          val columns = output.zipWithIndex.map {
            case (a, i) => if (i == index) col.as(colName) else Column(a)
          }
          select(columns: _*)
        }.getOrElse {
          select(Column("*"), col.as(colName))
        }
      }
    
    Does this look okay to you ? Let me know please ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175860380
  
    **[Test build #50215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50215/consoleFull)** for PR 10943 at commit [`e7f30a4`](https://github.com/apache/spark/commit/e7f30a40165b0e5c2cf86341bc5ce1b1079afe6e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178314893
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178767852
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175875161
  
    Actually this kind of problem has come out many times, I think we should distinguish `column name` and ```column path(which respects "." and "`")```, and have 2 methods that can parse `column name` and `column path` respectively. Currently we only have a `resolve` method that can parse `column path`, we should add one for `column name` and go through all `DataFrame` APIs to fix stuffs that should be `column name` but handled as `column path`
    
    cc @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178821593
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50582/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51619029
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1274,16 +1259,20 @@ class DataFrame private[sql](
        * @since 1.4.1
        */
       def drop(col: Column): DataFrame = {
    --- End diff --
    
    why we have this method....
    we can only drop top level columns, allowing users to pass in a `Column` doesn't make sense.
    
    cc @rxin @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175861346
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51608477
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1255,9 +1240,9 @@ class DataFrame private[sql](
        */
       @scala.annotation.varargs
       def drop(colNames: String*): DataFrame = {
    -    val resolver = sqlContext.analyzer.resolver
    -    val remainingCols =
    -      schema.filter(f => colNames.forall(n => !resolver(f.name, n))).map(f => Column(f.name))
    +    val output = queryExecution.analyzed.output
    +    val droppedAttrs = colNames.map(n => resolveToIndex(n)).flatten.map(output)
    +    val remainingCols = output.filterNot(droppedAttrs.contains).map(Column(_))
    --- End diff --
    
    An easier approach is to use the indexes:
    ```
    val indexesToDrop = colNames.map(indexOf).flatten
    if (indexesToDrop.isEmpty) {
      this
    } else {
      val output = queryExecution.analyzed.output
      val remainingCols = (0 until output.length).diff(indexesToDrop).map(index => Column(output(index)))
      select(remainingCols: _*)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51606881
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -150,6 +153,14 @@ class DataFrame private[sql](
         }
       }
     
    +  private[sql] def resolveToIndex(colName: String): Option[Int] = {
    +    val resolver = sqlContext.analyzer.resolver
    +    // First remove any user supplied quotes.
    +    val unquotedColName = colName.stripPrefix("`").stripSuffix("`")
    --- End diff --
    
    for example, what if a column is named ``` `a`a` ```? User should be able to just pass in ``` `a`a` ``` and we shouldn't strip the "`"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177363765
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178776399
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175455877
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178350376
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178820916
  
    **[Test build #50582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50582/consoleFull)** for PR 10943 at commit [`d2b373f`](https://github.com/apache/spark/commit/d2b373fe97c4b46fac9a03edbb9feca438352aa7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177352984
  
    **[Test build #50452 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50452/consoleFull)** for PR 10943 at commit [`8201994`](https://github.com/apache/spark/commit/82019947e9777a93ac4d137aed52e09a6434b56e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178310098
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-217603519
  
    cc @rxin , looks like we missed this one...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-187018074
  
    @cloud-fan Hi Wenchen, can you please advice on what is the next step for this PR ? I am thinking that it may require more discussion to decide if we need top keep or remove the df.drop(Column) interface.
    What do you think ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-175861351
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50215/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51612126
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1255,9 +1240,9 @@ class DataFrame private[sql](
        */
       @scala.annotation.varargs
       def drop(colNames: String*): DataFrame = {
    -    val resolver = sqlContext.analyzer.resolver
    -    val remainingCols =
    -      schema.filter(f => colNames.forall(n => !resolver(f.name, n))).map(f => Column(f.name))
    +    val output = queryExecution.analyzed.output
    +    val droppedAttrs = colNames.map(n => resolveToIndex(n)).flatten.map(output)
    +    val remainingCols = output.filterNot(droppedAttrs.contains).map(Column(_))
    --- End diff --
    
    @cloud-fan Thanks.. Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178783451
  
    **[Test build #50582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50582/consoleFull)** for PR 10943 at commit [`d2b373f`](https://github.com/apache/spark/commit/d2b373fe97c4b46fac9a03edbb9feca438352aa7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178775551
  
    @cloud-fan Can we retest please ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #10943: [SPARK-12988][SQL] Can't drop columns that contai...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10943


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #10943: [SPARK-12988][SQL] Can't drop columns that contain dots

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the issue:

    https://github.com/apache/spark/pull/10943
  
    how about we close this pr since https://github.com/apache/spark/pull/13306 has been merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177351900
  
    @cloud-fan Hi Wenchen, let me know if i have interpreted your suggestion correctly ? Please let me know if something is amiss. df.resolve() has many callers .. so i have not changed its name but have added a comment. Let me know if you want me to refactor it. Thanks..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51609364
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -150,6 +153,14 @@ class DataFrame private[sql](
         }
       }
     
    +  private[sql] def resolveToIndex(colName: String): Option[Int] = {
    +    val resolver = sqlContext.analyzer.resolver
    +    // First remove any user supplied quotes.
    +    val unquotedColName = colName.stripPrefix("`").stripSuffix("`")
    --- End diff --
    
    @cloud-fan Hi Wenchen,
    
    Can you please go through the following comment.
    
     https://issues.apache.org/jira/browse/SPARK-12988?focusedCommentId=15118433&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15118433
    
    I was trying to address the 3rd bullet in the list. About your second question , per bullet one this should be disallowed ? Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177363640
  
    **[Test build #50452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50452/consoleFull)** for PR 10943 at commit [`8201994`](https://github.com/apache/spark/commit/82019947e9777a93ac4d137aed52e09a6434b56e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51607335
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1220,19 +1213,11 @@ class DataFrame private[sql](
        * @since 1.3.0
        */
       def withColumnRenamed(existingName: String, newName: String): DataFrame = {
    -    val resolver = sqlContext.analyzer.resolver
         val output = queryExecution.analyzed.output
    -    val shouldRename = output.exists(f => resolver(f.name, existingName))
    -    if (shouldRename) {
    -      val columns = output.map { col =>
    -        if (resolver(col.name, existingName)) {
    -          Column(col).as(newName)
    -        } else {
    -          Column(col)
    -        }
    -      }
    -      select(columns : _*)
    -    } else {
    +    resolveToIndex(existingName).map {index =>
    +      select(output.map(attr =>
    +        Column(attr)).updated(index, Column(output(index)).as(newName)) : _*)
    --- End diff --
    
    we can define a `val renamed = Column(output(index)).as(newName)` first and make this line short enough to fit in one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178350183
  
    **[Test build #50525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50525/consoleFull)** for PR 10943 at commit [`dfaa13b`](https://github.com/apache/spark/commit/dfaa13be459f49a616bad2b6180b19292ccccabc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178319137
  
    **[Test build #50525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50525/consoleFull)** for PR 10943 at commit [`dfaa13b`](https://github.com/apache/spark/commit/dfaa13be459f49a616bad2b6180b19292ccccabc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-177363766
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50452/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178767857
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50576/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51355492
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -150,6 +153,17 @@ class DataFrame private[sql](
         }
       }
     
    +  /**
    +   * Resolves a column name. This is called when it is required to resolve a column by its
    +   * name only and not as a column path..
    +   */
    +  private[sql] def resolveColName(colName: String, userSuppliedName: String): Boolean = {
    --- End diff --
    
    how about
    ```
    private[sql] def indexOf(colName: String): Option[Int] = {
      val resolver = sqlContext.analyzer.resolver
      val index = queryExecution.analyzed.output.indexWhere(f => resolver(f.name, colName))
      if (index >= 0) Some(index) else None
    }
    ```
    
    then we can rewrite `withColumn` to:
    ```
    indexOf(colName).map { index =>
      select(output.updated(index, col.as(colName)).map(Column(_)) : _*)
    }.getOrElse {
      select(Column("*"), col.as(colName))
    }
    ```
    
    There may be better name for this, like `resolveToIndex`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178821588
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178310099
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50522/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178223443
  
    ah, change `select(output.updated(index, col.as(colName)).map(Column(_)) : _*)` to `select(output.map(Column(_)).updated(index, col.as(colName)): _*)` should work


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10943#issuecomment-178300442
  
    **[Test build #50522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50522/consoleFull)** for PR 10943 at commit [`dfaa13b`](https://github.com/apache/spark/commit/dfaa13be459f49a616bad2b6180b19292ccccabc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51606542
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -150,6 +153,14 @@ class DataFrame private[sql](
         }
       }
     
    +  private[sql] def resolveToIndex(colName: String): Option[Int] = {
    +    val resolver = sqlContext.analyzer.resolver
    +    // First remove any user supplied quotes.
    +    val unquotedColName = colName.stripPrefix("`").stripSuffix("`")
    --- End diff --
    
    do we need to do this? I think for these methods that require column name, user should just pass in an exact column name string, and we don't need to do any extra parsing here, i.e. no resolver, no strip for "`"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10943#discussion_r51611035
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
    @@ -1220,19 +1213,11 @@ class DataFrame private[sql](
        * @since 1.3.0
        */
       def withColumnRenamed(existingName: String, newName: String): DataFrame = {
    -    val resolver = sqlContext.analyzer.resolver
         val output = queryExecution.analyzed.output
    -    val shouldRename = output.exists(f => resolver(f.name, existingName))
    -    if (shouldRename) {
    -      val columns = output.map { col =>
    -        if (resolver(col.name, existingName)) {
    -          Column(col).as(newName)
    -        } else {
    -          Column(col)
    -        }
    -      }
    -      select(columns : _*)
    -    } else {
    +    resolveToIndex(existingName).map {index =>
    +      select(output.map(attr =>
    +        Column(attr)).updated(index, Column(output(index)).as(newName)) : _*)
    --- End diff --
    
    @cloud-fan Sure. Will do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org