You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by codeatri <gi...@git.apache.org> on 2018/08/08 19:35:58 UTC

[GitHub] spark pull request #22045: [SPARK-23939][SQL] Add transform_values SQL funct...

GitHub user codeatri opened a pull request:

    https://github.com/apache/spark/pull/22045

    [SPARK-23939][SQL] Add transform_values SQL function

    ## What changes were proposed in this pull request?
    This pr adds `transform_values` function which applies the function to each entry of the map and transforms the values.
    ```javascript
    > SELECT transform_values(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> v + 1);
           map(1->2, 2->3, 3->4)
    
    > SELECT transform_keys(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + v);
           map(1->2, 2->4, 3->6)
    ```
    ## How was this patch tested?
    New Tests added to
    `DataFrameFunctionsSuite`
    `HigherOrderFunctionsSuite`
    `SQLQueryTestSuite`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/codeatri/spark SPARK-23940

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22045
    
----
commit 68392e31d86f26663fbb8e5badac82b356081f47
Author: codeatri <ne...@...>
Date:   2018-08-08T18:42:36Z

    Added transform_values function

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22045


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)** for PR 22045 at commit [`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23939][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210469472
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,53 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    --- End diff --
    
    nit: indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94783/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94843/testReport)** for PR 22045 at commit [`56d08ef`](https://github.com/apache/spark/commit/56d08ef37531f8e25ae2c7fe3996cf7657384a80).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94864/testReport)** for PR 22045 at commit [`3382e1a`](https://github.com/apache/spark/commit/3382e1a5396c8e5a94802d92a7106eacf627617c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Thanks! merging to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165373
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala ---
    @@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper
           15)
       }
     
    +  test("TransformValues") {
    +    val ai0 = Literal.create(
    +      Map(1 -> 1, 2 -> 2, 3 -> 3),
    +      MapType(IntegerType, IntegerType))
    +    val ai1 = Literal.create(
    +      Map(1 -> 1, 2 -> null, 3 -> 3),
    +      MapType(IntegerType, IntegerType))
    +    val ain = Literal.create(
    +      Map.empty[Int, Int],
    +      MapType(IntegerType, IntegerType))
    --- End diff --
    
    Can you add tests for `Literal.create(null, MapType(IntegerType, IntegerType))`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94827/testReport)** for PR 22045 at commit [`daf7935`](https://github.com/apache/spark/commit/daf793599a6da5c11dbc4a6bd6e5dea3e0d47afd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    @codeatri Could you fix the conflicts please? Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210561102
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,53 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +  usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +  examples = """
    +    Examples:
    +      > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +      > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +  since = "2.4.0")
    +case class TransformValues(
    +    argument: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = argument.nullable
    +
    +  @transient lazy val MapType(keyType, valueType, valueContainsNull) = argument.dataType
    +
    +  override def dataType: DataType = MapType(keyType, function.dataType, valueContainsNull)
    --- End diff --
    
    Shouldn't the ```dataType``` be defined as ```MapType(keyType, function.dataType, function.nullable)```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210164879
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,60 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
    --- End diff --
    
    nit: we need one more right parenthesis after the second `array(1, 2, 3)`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210470513
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,53 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    argument: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = argument.nullable
    +
    +  @transient lazy val MapType(keyType, valueType, valueContainsNull) = argument.dataType
    +
    +  override def dataType: DataType = MapType(keyType, function.dataType, valueContainsNull)
    +
    +  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction)
    +  : TransformValues = {
    +    copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil))
    +  }
    +
    +  @transient lazy val LambdaFunction(
    +  _, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
    --- End diff --
    
    nit: indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210469510
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,53 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210469494
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,53 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    --- End diff --
    
    ditto.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by codeatri <gi...@git.apache.org>.

Github user codeatri commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210402976
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,60 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
    --- End diff --
    
    @ueshin  Thanks for the review! and yes I agree, I made the same mistakes in both the PR's. I was waiting for the transform_key to converge so that I can make the same changes here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165448
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
    @@ -2302,6 +2302,210 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
         assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map"))
       }
     
    +  test("transform values function - test various primitive data types combinations") {
    --- End diff --
    
    We don't need so many cases here. We only need to verify the api works end to end.
    Evaluation checks of the function should be in `HigherOrderFunctionsSuite`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23939][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210471011
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
    @@ -2302,6 +2302,177 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext {
         assert(ex5.getMessage.contains("function map_zip_with does not support ordering on type map"))
       }
     
    +  test("transform values function - test primitive data types") {
    +    val dfExample1 = Seq(
    +      Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
    +    ).toDF("i")
    +
    +    val dfExample2 = Seq(
    +      Map[Boolean, String](false -> "abc", true -> "def")
    +    ).toDF("x")
    +
    +    val dfExample3 = Seq(
    +      Map[String, Int]("a" -> 1, "b" -> 2, "c" -> 3)
    +    ).toDF("y")
    +
    +    val dfExample4 = Seq(
    +      Map[Int, Double](1 -> 1.0, 2 -> 1.40, 3 -> 1.70)
    +    ).toDF("z")
    +
    +    val dfExample5 = Seq(
    +      Map[Int, Array[Int]](1 -> Array(1, 2))
    +    ).toDF("c")
    +
    +    def testMapOfPrimitiveTypesCombination(): Unit = {
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> k + v)"),
    +        Seq(Row(Map(1 -> 2, 9 -> 18, 8 -> 16, 7 -> 14))))
    +
    +      checkAnswer(dfExample2.selectExpr(
    +        "transform_values(x, (k, v) -> if(k, v, CAST(k AS String)))"),
    +        Seq(Row(Map(false -> "false", true -> "def"))))
    +
    +      checkAnswer(dfExample2.selectExpr("transform_values(x, (k, v) -> NOT k AND v = 'abc')"),
    +        Seq(Row(Map(false -> true, true -> false))))
    +
    +      checkAnswer(dfExample3.selectExpr("transform_values(y, (k, v) -> v * v)"),
    +        Seq(Row(Map("a" -> 1, "b" -> 4, "c" -> 9))))
    +
    +      checkAnswer(dfExample3.selectExpr(
    +        "transform_values(y, (k, v) -> k || ':' || CAST(v as String))"),
    +        Seq(Row(Map("a" -> "a:1", "b" -> "b:2", "c" -> "c:3"))))
    +
    +      checkAnswer(
    +        dfExample3.selectExpr("transform_values(y, (k, v) -> concat(k, cast(v as String)))"),
    +        Seq(Row(Map("a" -> "a1", "b" -> "b2", "c" -> "c3"))))
    +
    +      checkAnswer(
    +        dfExample4.selectExpr(
    +          "transform_values(" +
    +            "z,(k, v) -> map_from_arrays(ARRAY(1, 2, 3), " +
    +            "ARRAY('one', 'two', 'three'))[k] || '_' || CAST(v AS String))"),
    +        Seq(Row(Map(1 -> "one_1.0", 2 -> "two_1.4", 3 ->"three_1.7"))))
    +
    +      checkAnswer(
    +        dfExample4.selectExpr("transform_values(z, (k, v) -> k-v)"),
    +        Seq(Row(Map(1 -> 0.0, 2 -> 0.6000000000000001, 3 -> 1.3))))
    +
    +      checkAnswer(
    +        dfExample5.selectExpr("transform_values(c, (k, v) -> k + cardinality(v))"),
    +        Seq(Row(Map(1 -> 3))))
    +    }
    +
    +    // Test with local relation, the Project will be evaluated without codegen
    +    testMapOfPrimitiveTypesCombination()
    +    dfExample1.cache()
    +    dfExample2.cache()
    +    dfExample3.cache()
    +    dfExample4.cache()
    +    dfExample5.cache()
    +    // Test with cached relation, the Project will be evaluated with codegen
    +    testMapOfPrimitiveTypesCombination()
    +  }
    +
    +  test("transform values function - test empty") {
    +    val dfExample1 = Seq(
    +      Map.empty[Integer, Integer]
    +    ).toDF("i")
    +
    +    val dfExample2 = Seq(
    +      Map.empty[BigInt, String]
    +    ).toDF("j")
    +
    +    def testEmpty(): Unit = {
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> NULL)"),
    +        Seq(Row(Map.empty[Integer, Integer])))
    +
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> k)"),
    +        Seq(Row(Map.empty[Integer, Integer])))
    +
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> v)"),
    +        Seq(Row(Map.empty[Integer, Integer])))
    +
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 0)"),
    +        Seq(Row(Map.empty[Integer, Integer])))
    +
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> 'value')"),
    +        Seq(Row(Map.empty[Integer, String])))
    +
    +      checkAnswer(dfExample1.selectExpr("transform_values(i, (k, v) -> true)"),
    +        Seq(Row(Map.empty[Integer, Boolean])))
    +
    +      checkAnswer(dfExample2.selectExpr("transform_values(j, (k, v) -> k + cast(v as BIGINT))"),
    +        Seq(Row(Map.empty[BigInt, BigInt])))
    +    }
    +
    +    testEmpty()
    +    dfExample1.cache()
    +    dfExample2.cache()
    +    testEmpty()
    +  }
    +
    +  test("transform values function - test null values") {
    +    val dfExample1 = Seq(
    +      Map[Int, Integer](1 -> 1, 2 -> 2, 3 -> 3, 4 -> 4)
    +    ).toDF("a")
    +
    +    val dfExample2 = Seq(
    +      Map[Int, String](1 -> "a", 2 -> "b", 3 -> null)
    +    ).toDF("b")
    +
    +    def testNullValue(): Unit = {
    +      checkAnswer(dfExample1.selectExpr("transform_values(a, (k, v) -> null)"),
    +        Seq(Row(Map[Int, Integer](1 -> null, 2 -> null, 3 -> null, 4 -> null))))
    +
    +      checkAnswer(dfExample2.selectExpr(
    +        "transform_values(b, (k, v) -> IF(v IS NULL, k + 1, k + 2))"),
    +        Seq(Row(Map(1 -> 3, 2 -> 4, 3 -> 4))))
    +    }
    +
    +    testNullValue()
    +    dfExample1.cache()
    +    dfExample2.cache()
    +    testNullValue()
    +  }
    +
    +  test("transform values function - test invalid functions") {
    +    val dfExample1 = Seq(
    +      Map[Int, Int](1 -> 1, 9 -> 9, 8 -> 8, 7 -> 7)
    +    ).toDF("i")
    +
    +    val dfExample2 = Seq(
    +      Map[String, String]("a" -> "b")
    +    ).toDF("j")
    +
    +    val dfExample3 = Seq(
    +      Seq(1, 2, 3, 4)
    +    ).toDF("x")
    +
    +    def testInvalidLambdaFunctions(): Unit = {
    +
    +      val ex1 = intercept[AnalysisException] {
    +        dfExample1.selectExpr("transform_values(i, k -> k )")
    --- End diff --
    
    nit: remove an extra space after `k -> k`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r208746631
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -442,3 +442,61 @@ case class ArrayAggregate(
     
       override def prettyName: String = "aggregate"
     }
    +
    +/**
    + * Transform Values for every entry of the map by applying transform_values function.
    + * Returns map wth transformed values
    --- End diff --
    
    typos: Transforms values; with 
    Maybe can you think of a better comment?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210164955
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,60 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    argument: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = argument.nullable
    +
    +  override def dataType: DataType = {
    +    val map = argument.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, function.nullable)
    +  }
    +
    +  @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType
    --- End diff --
    
    `lazy val`?
    Could you add a test when argument is not a map in invalid cases of `DataFrameFunctionsSuite`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    ok to test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165194
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala ---
    @@ -95,6 +95,12 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper
         aggregate(expr, zero, merge, identity)
       }
     
    +  def transformValues(expr: Expression, f: (Expression, Expression) => Expression): Expression = {
    +    val valueType = expr.dataType.asInstanceOf[MapType].valueType
    +    val keyType = expr.dataType.asInstanceOf[MapType].keyType
    +    TransformValues(expr, createLambda(keyType, false, valueType, true, f))
    --- End diff --
    
    We should use `valueContainsNull` instead of `true`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165225
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala ---
    @@ -283,6 +289,61 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper
           15)
       }
     
    +  test("TransformValues") {
    +    val ai0 = Literal.create(
    +      Map(1 -> 1, 2 -> 2, 3 -> 3),
    +      MapType(IntegerType, IntegerType))
    --- End diff --
    
    Can you add `valueContainsNull` explicitly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r208751629
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -442,3 +442,61 @@ case class ArrayAggregate(
     
       override def prettyName: String = "aggregate"
     }
    +
    +/**
    + * Transform Values for every entry of the map by applying transform_values function.
    + * Returns map wth transformed values
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    input: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = input.nullable
    +
    +  override def dataType: DataType = {
    +    val map = input.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, map.valueContainsNull)
    +  }
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType, expectingFunctionType)
    +
    +  @transient val (keyType, valueType, valueContainsNull) =
    +    HigherOrderFunction.mapKeyValueArgumentType(input.dataType)
    +
    +  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction):
    --- End diff --
    
    nit: formatting


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r208747953
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -442,3 +442,61 @@ case class ArrayAggregate(
     
       override def prettyName: String = "aggregate"
     }
    +
    +/**
    + * Transform Values for every entry of the map by applying transform_values function.
    + * Returns map wth transformed values
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1);
    --- End diff --
    
    nit:```(k, v)``` and maybe I would use ```v + 1``` instead of ```k + 1```.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94864 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94864/testReport)** for PR 22045 at commit [`3382e1a`](https://github.com/apache/spark/commit/3382e1a5396c8e5a94802d92a7106eacf627617c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94827/testReport)** for PR 22045 at commit [`daf7935`](https://github.com/apache/spark/commit/daf793599a6da5c11dbc4a6bd6e5dea3e0d47afd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165543
  
    --- Diff: sql/core/src/test/resources/sql-tests/inputs/higher-order-functions.sql ---
    @@ -51,3 +51,17 @@ select exists(ys, y -> y > 30) as v from nested;
     
     -- Check for element existence in a null array
     select exists(cast(null as array<int>), y -> y > 30) as v;
    +                                                                         
    +create or replace temporary view nested as values
    +  (1, map(1,1,2,2,3,3)),
    +  (2, map(4,4,5,5,6,6))
    --- End diff --
    
    nit:
    
    ```
      (1, map(1, 1, 2, 2, 3, 3)),
      (2, map(4, 4, 5, 5, 6, 6))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94843/testReport)** for PR 22045 at commit [`56d08ef`](https://github.com/apache/spark/commit/56d08ef37531f8e25ae2c7fe3996cf7657384a80).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94843/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r208750446
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -442,3 +442,61 @@ case class ArrayAggregate(
     
       override def prettyName: String = "aggregate"
     }
    +
    +/**
    + * Transform Values for every entry of the map by applying transform_values function.
    + * Returns map wth transformed values
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    input: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = input.nullable
    +
    +  override def dataType: DataType = {
    +    val map = input.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, map.valueContainsNull)
    +  }
    +
    +  override def inputTypes: Seq[AbstractDataType] = Seq(MapType, expectingFunctionType)
    --- End diff --
    
    This is already specified by ```MapBasedSimpleHigherOrderFunction```.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    **[Test build #94783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94783/testReport)** for PR 22045 at commit [`b73106d`](https://github.com/apache/spark/commit/b73106d43000972ab9adae3d3b463a0dada2b9cc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210164976
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,60 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    argument: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = argument.nullable
    +
    +  override def dataType: DataType = {
    +    val map = argument.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, function.nullable)
    --- End diff --
    
    We can use `keyType` from the following val?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23939][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94827/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22045
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94864/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by mn-mikke <gi...@git.apache.org>.

Github user mn-mikke commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r208749197
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -442,3 +442,61 @@ case class ArrayAggregate(
     
       override def prettyName: String = "aggregate"
     }
    +
    +/**
    + * Transform Values for every entry of the map by applying transform_values function.
    + * Returns map wth transformed values
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k,v) -> k + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    input: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = input.nullable
    +
    +  override def dataType: DataType = {
    +    val map = input.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, map.valueContainsNull)
    --- End diff --
    
    ```map.valueContainsNull``` -> ```function.nullable```?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22045: [SPARK-23940][SQL] Add transform_values SQL funct...

Posted by ueshin <gi...@git.apache.org>.

Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22045#discussion_r210165102
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala ---
    @@ -497,6 +497,60 @@ case class ArrayAggregate(
       override def prettyName: String = "aggregate"
     }
     
    +/**
    + * Returns a map that applies the function to each value of the map.
    + */
    +@ExpressionDescription(
    +usage = "_FUNC_(expr, func) - Transforms values in the map using the function.",
    +examples = """
    +    Examples:
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> v + 1);
    +        map(array(1, 2, 3), array(2, 3, 4))
    +       > SELECT _FUNC_(map(array(1, 2, 3), array(1, 2, 3), (k, v) -> k + v);
    +        map(array(1, 2, 3), array(2, 4, 6))
    +  """,
    +since = "2.4.0")
    +case class TransformValues(
    +    argument: Expression,
    +    function: Expression)
    +  extends MapBasedSimpleHigherOrderFunction with CodegenFallback {
    +
    +  override def nullable: Boolean = argument.nullable
    +
    +  override def dataType: DataType = {
    +    val map = argument.dataType.asInstanceOf[MapType]
    +    MapType(map.keyType, function.dataType, function.nullable)
    +  }
    +
    +  @transient val MapType(keyType, valueType, valueContainsNull) = argument.dataType
    +
    +  override def bind(f: (Expression, Seq[(DataType, Boolean)]) => LambdaFunction)
    +  : TransformValues = {
    +    copy(function = f(function, (keyType, false) :: (valueType, valueContainsNull) :: Nil))
    +  }
    +
    +  @transient lazy val (keyVar, valueVar) = {
    +    val LambdaFunction(
    +    _, (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
    +    (keyVar, valueVar)
    +  }
    --- End diff --
    
    nit: how about:
    
    ```scala
    @transient lazy val LambdaFunction(_,
      (keyVar: NamedLambdaVariable) :: (valueVar: NamedLambdaVariable) :: Nil, _) = function
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org