You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrés Doncel Ramírez (JIRA)" <ji...@apache.org> on 2019/02/13 12:04:00 UTC
[jira] [Created] (SPARK-26869) UDF with struct requires to have _1
and _2 as struct field names
Andrés Doncel Ramírez created SPARK-26869:
---------------------------------------------
Summary: UDF with struct requires to have _1 and _2 as struct field names
Key: SPARK-26869
URL: https://issues.apache.org/jira/browse/SPARK-26869
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.0, 2.3.0
Environment: Ubuntu 18.04.1 LTS
Reporter: Andrés Doncel Ramírez
When using a UDF which has a Seq of tuples as input, the struct field names need to match "_1" and "_2". The following code illustrates this.
{code:java}
val df = sc.parallelize(Array(
("1",3.0),
("2",4.5),
("5",2.0)
)
).toDF("c1","c2")
val df1=df.agg(collect_list(struct("c1","c2")).as("c3"))
// Changing column names to _1 and _2 when creating the struct
val df2=df.agg(collect_list(struct(col("c1").as("_1"),col("c2").as("_2"))).as("c3"))
def takeUDF = udf({ (xs: Seq[(String, Double)]) =>
xs.take(2)
})
df1.printSchema
df2.printSchema
df1.withColumn("c4",takeUDF(col("c3"))).show() // this fails
df2.withColumn("c4",takeUDF(col("c3"))).show() // this works
{code}
The first one returns the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(c3)' due to data type mismatch: argument 1 requires array<struct<_1:string,_2:double>> type, however, '`c3`' is of array<struct<c1:string,c2:double>> type.;;
While the second works as expected and prints the result.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org