You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liyichao <gi...@git.apache.org> on 2017/05/30 14:39:03 UTC

[GitHub] spark pull request #18144: [SPARK-20912][SQL] Allow column name in map funct...

GitHub user liyichao opened a pull request:

    https://github.com/apache/spark/pull/18144

    [SPARK-20912][SQL] Allow column name in map functions.

    ## What changes were proposed in this pull request?
    
    `map` function only accepts Column values only. It'd be very helpful to have a variant that accepts String for columns just like what `array` or `struct`.
    
    ## How was this patch tested?
    
    Added a test in `DataFrameFunctionsSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liyichao/spark SPARK-20912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18144.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18144
    
----
commit 64a98306c6e1e6e9f79a976044c6895779afb24c
Author: Li Yichao <ly...@zhihu.com>
Date:   2017-05-30T14:35:28Z

    Allow column name in map functions.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    Thanks for your work. But, I'm not 100% sure we need to make this signature consistent with `array` and `struct`. IIUC we preferentially add these functions in `FunctionRegistry` (users can use them via `selectExpr`) and do not touch this file. cc: @gatorsmile @cloud-fan  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18144: [SPARK-20912][SQL] Allow column name in map funct...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao closed the pull request at:

    https://github.com/apache/spark/pull/18144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    @jaceklaskowski  This brought a lot of pain in the past(see `functions.atan2`) and is ambiguous when a function parameter accepts both column and string literal. If we can go back we should never add functions with string parameter as column name. Maybe we should deprecate them for consistency...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    As the idea is not that good, this is closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    IMHO, I think we have many similar cases in `functions.scala` or other APIs and probably we should avoid adding APIs just for consistency ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18144: [SPARK-20912][SQL] Allow column name in map funct...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18144#discussion_r119124578
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -1007,6 +1007,17 @@ object functions {
       def map(cols: Column*): Column = withExpr { CreateMap(cols.map(_.expr)) }
     
       /**
    +   * Creates a new map column. The input columns must be grouped as key-value pairs, e.g.
    +   * (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't
    +   * be null. The value columns must all have the same data type.
    +   *
    +   * @group normal_funcs
    +   * @since 2.3
    +   */
    +  @scala.annotation.varargs
    +  def map(colName: String, colNames: String*): Column = map((colName +: colNames).map(col) : _*)
    --- End diff --
    
    nit: need bracket:
    ```
    def map(colName: String, colNames: String*): Column = {
      map((colName +: colNames).map(col) : _*)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    I'm hesitating to add more methods with string parameters, it will blow up the number of methods quickly...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    @cloud-fan I don't understand why would that be an issue...ever. The API is not consistent and I often run into it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

Posted by jaceklaskowski <gi...@git.apache.org>.
Github user jaceklaskowski commented on the issue:

    https://github.com/apache/spark/pull/18144
  
    @cloud-fan If consistency is to remove (not add) I'm fine. Either way consistency is the ultimate goal (as I myself am running into this discrepancy far too often).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org