You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Pavel Chernikov (Jira)" <ji...@apache.org> on 2021/03/23 00:02:00 UTC

[jira] [Commented] (SPARK-34830) Some UDF calls inside transform are broken

    [ https://issues.apache.org/jira/browse/SPARK-34830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306655#comment-17306655 ] 

Pavel Chernikov commented on SPARK-34830:
-----------------------------------------

[~dsolow1], I've recently bumped into another issue with UDF calls and {{transform}} functionality. See, https://issues.apache.org/jira/browse/SPARK-34829

> Some UDF calls inside transform are broken
> ------------------------------------------
>
>                 Key: SPARK-34830
>                 URL: https://issues.apache.org/jira/browse/SPARK-34830
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.1
>            Reporter: Daniel Solow
>            Priority: Major
>
> Let's say I want to create a UDF to do a simple lookup on a string:
> {code:java}
> import org.apache.spark.sql.{functions => f}
> val M = Map("a" -> "abc", "b" -> "defg")
> val BM = spark.sparkContext.broadcast(M)
> val LOOKUP = f.udf((s: String) => BM.value.get(s))
> {code}
> Now if I have the following dataframe:
> {code:java}
> val df = Seq(
>     Tuple1(Seq("a", "b"))
> ).toDF("arr")
> {code}
> and I want to run this UDF over each element in the array, I can do:
> {code:java}
> df.select(f.transform($"arr", i => LOOKUP(i)).as("arr")).show(false)
> {code}
> This should show:
> {code:java}
> +-----------+
> |arr        |
> +-----------+
> |[abc, defg]|
> +-----------+
> {code}
> However it actually shows:
> {code:java}
> +-----------+
> |arr        |
> +-----------+
> |[def, defg]|
> +-----------+
> {code}
> It's also broken for SQL (even without DSL). This gives the same result:
> {code:java}
> spark.udf.register("LOOKUP",(s: String) => BM.value.get(s))
> df.selectExpr("TRANSFORM(arr, a -> LOOKUP(a)) AS arr").show(false)
> {code}
> Note that "def" is not even in the map I'm using.
> This is a big problem because it breaks existing code/UDFs. I noticed this because the job I ported from 2.4.5 to 3.1.1 seemed to be working, but was actually producing broken data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org