You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/28 03:38:34 UTC

[GitHub] [spark] nvander1 opened a new pull request #24232: [SPARK-27297] [SQL] Add higher order functions to scala API

nvander1 opened a new pull request #24232: [SPARK-27297] [SQL] Add higher order functions to scala API
URL: https://github.com/apache/spark/pull/24232
 
 
   ## What changes were proposed in this pull request?
   
   There is currently no existing Scala API equivalent for the higher order functions introduced in Spark 2.4.0.
    * transform
    * aggregate
    * filter
    * exists
    * zip_with
    * map_zip_with
    * map_filter
    * transform_values
    * transform_keys
   
   Equivalent column based functions should be added to the Scala API for org.apache.spark.sql.functions with the following signatures:
   
    
   ```scala
   def transform(column: Column, f: Column => Column): Column = ???
   
   def transform(column: Column, f: (Column, Column) => Column): Column = ???
   
   def exists(column: Column, f: Column => Column): Column = ???
   
   def filter(column: Column, f: Column => Column): Column = ???
   
   def aggregate(
   expr: Column,
   zero: Column,
   merge: (Column, Column) => Column,
   finish: Column => Column): Column = ???
   
   def aggregate(
   expr: Column,
   zero: Column,
   merge: (Column, Column) => Column): Column = ???
   
   def zip_with(
   left: Column,
   right: Column,
   f: (Column, Column) => Column): Column = ???
   
   def transform_keys(expr: Column, f: (Column, Column) => Column): Column = ???
   
   def transform_values(expr: Column, f: (Column, Column) => Column): Column = ???
   
   def map_filter(expr: Column, f: (Column, Column) => Column): Column = ???
   
   def map_zip_with(left: Column, right: Column, f: (Column, Column, Column) => Column): Column = ???
   ```
   
   ## How was this patch tested?
   
   I've mimicked the existing tests for the higher order functions in `org.apache.spark.sql.DataFrameFunctionsSuite` that use `expr` to test the higher order functions.
   
   As an example of an existing test: 
   ```scala
     test("map_zip_with function - map of primitive types") {
       val df = Seq(
         (Map(8 -> 6L, 3 -> 5L, 6 -> 2L), Map[Integer, Integer]((6, 4), (8, 2), (3, 2))),
         (Map(10 -> 6L, 8 -> 3L), Map[Integer, Integer]((8, 4), (4, null))),
         (Map.empty[Int, Long], Map[Integer, Integer]((5, 1))),
         (Map(5 -> 1L), null)
       ).toDF("m1", "m2")
   
       checkAnswer(df.selectExpr("map_zip_with(m1, m2, (k, v1, v2) -> k == v1 + v2)"),
         Seq(
           Row(Map(8 -> true, 3 -> false, 6 -> true)),
           Row(Map(10 -> null, 8 -> false, 4 -> null)),
           Row(Map(5 -> null)),
           Row(null)))
   }
   ```
   
   I've added this test that performs the same logic, but with the new column based API I've added.
   ```scala
       checkAnswer(df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k === v1 + v2)),
         Seq(
           Row(Map(8 -> true, 3 -> false, 6 -> true)),
           Row(Map(10 -> null, 8 -> false, 4 -> null)),
           Row(Map(5 -> null)),
           Row(null)))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org