You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mark Hamilton (Jira)" <ji...@apache.org> on 2021/01/05 05:55:00 UTC
[jira] [Created] (SPARK-34002) Broken UDF behavior
Mark Hamilton created SPARK-34002:
-------------------------------------
Summary: Broken UDF behavior
Key: SPARK-34002
URL: https://issues.apache.org/jira/browse/SPARK-34002
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.0.1
Reporter: Mark Hamilton
UDFs can behave differently depending on if a dataframe is cached, despite the dataframe being identical
Repro:
{code:java}
case class Bar(a: Int)
import spark.implicits._
def f1(bar: Bar): Option[Bar] = {
None
}
def f2(bar: Bar): Option[Bar] = {
Option(bar)
}
val udf1: UserDefinedFunction = udf(f1 _)
val udf2: UserDefinedFunction = udf(f2 _)
// Commenting in the cache will make this example work
val df = (1 to 10).map(i => Tuple1(Bar(1))).toDF("c0")//.cache()
val newDf = df
.withColumn("c1", udf1(col("c0")))
.withColumn("c2", udf2(col("c1")))
newDf.show()
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org