You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/03/07 06:04:00 UTC

[jira] [Commented] (SPARK-34583) typed udf fails when it refers to type member in abstract class

    [ https://issues.apache.org/jira/browse/SPARK-34583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296751#comment-17296751 ] 

Hyukjin Kwon commented on SPARK-34583:
--------------------------------------

cc [~Ngone51] [~cloud_fan] FYI

> typed udf fails when it refers to type member in abstract class
> ---------------------------------------------------------------
>
>                 Key: SPARK-34583
>                 URL: https://issues.apache.org/jira/browse/SPARK-34583
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: kondziolka9ld
>            Priority: Minor
>
>  Please consider a following scenario: 
> {code:java}
> scala> abstract class SomeAbstractClass {
>      |   type SomeTypeMember
>      | }
> defined class SomeAbstractClassscala> class SomeSpecificClass extends SomeAbstractClass {
>      |   override type SomeTypeMember = Int
>      | }
> defined class SomeSpecificClassscala> def someFunction(someInstance: SomeAbstractClass): Any = {
>      |   udf { _: someInstance.SomeTypeMember => 42 }
>      | }
> someFunction: (someInstance: SomeAbstractClass)Any
> scala> someFunction(new SomeSpecificClass)
> java.lang.NoClassDefFoundError: no Java class corresponding to someInstance.SomeTypeMember found
>   at scala.reflect.runtime.JavaMirrors$JavaMirror.typeToJavaClass(JavaMirrors.scala:1354)
>   at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:227)
>   at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:68)
>   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:56)
>   at org.apache.spark.sql.functions$.$anonfun$udf$1(functions.scala:4509)
>   at scala.util.Try$.apply(Try.scala:213)
>   at org.apache.spark.sql.functions$.udf(functions.scala:4509)
>   at someFunction(<console>:25)
>   ... 47 elided
> {code}
> On *spark-2.4.7* it works. I guess that it is related to:
> {code:java}
> In Spark 3.0, using org.apache.spark.sql.functions.udf(AnyRef, DataType) is not allowed by default. Remove the return type parameter to automatically switch to typed Scala udf is recommended, or set spark.sql.legacy.allowUntypedScalaUDF to true to keep using it. In Spark version 2.4 and below, if org.apache.spark.sql.functions.udf(AnyRef, DataType) gets a Scala closure with primitive-type argument, the returned UDF returns null if the input values is null. However, in Spark 3.0, the UDF returns the default value of the Java type if the input value is null. For example, val f = udf((x: Int) => x, IntegerType), f($"x") returns null in Spark 2.4 and below if column x is null, and return 0 in Spark 3.0. This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default.
> {code}
> [https://spark.apache.org/docs/latest/sql-migration-guide.html#udfs-and-built-in-functions]
> Does spark try to do some type inferation? When it refers to `SomeAbstractClass.SomeTypeMember` it really does not exist.
> Some workaround could be runtime type casting, something like: 
> {code:java}
> udf { param: Any => {
>      ...
>      param.asInstanceOf[someInstance.SomeTypeMember]
>      ...  
> } {code}
> ----
>  I classified it as bug since on previous versions of spark it worked. However, I believe that it can work as designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org