You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Mark Hamilton (Jira)" <ji...@apache.org> on 2021/01/05 05:55:00 UTC

[jira] [Created] (SPARK-34002) Broken UDF behavior

Mark Hamilton created SPARK-34002:
-------------------------------------

             Summary: Broken UDF behavior
                 Key: SPARK-34002
                 URL: https://issues.apache.org/jira/browse/SPARK-34002
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.0.1
            Reporter: Mark Hamilton


UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical

 

Repro:

 
{code:java}
case class Bar(a: Int)
 
import spark.implicits._

def f1(bar: Bar): Option[Bar] = {
 None
}

def f2(bar: Bar): Option[Bar] = {
 Option(bar)
}

val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)

// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
 .withColumn("c1", udf1(col("c0")))
 .withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org