You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/09/10 03:06:00 UTC

[jira] [Resolved] (SPARK-29009) Returning pojo from udf not working

     [ https://issues.apache.org/jira/browse/SPARK-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-29009.
----------------------------------
    Resolution: Invalid

Resolving due to no feedback from its author.

> Returning pojo from udf not working
> -----------------------------------
>
>                 Key: SPARK-29009
>                 URL: https://issues.apache.org/jira/browse/SPARK-29009
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Tomasz Belina
>            Priority: Major
>
>  It looks like spark is unable to construct row from pojo returned from udf.
> Give POJO:
> {code:java}
> public class SegmentStub {
>     private int id;
>     private Date statusDateTime;
>     private int healthPointRatio;
> }
> {code}
> Registration of the UDF:
> {code:java}
> public class ParseResultsUdf {
>     public String registerUdf(SparkSession sparkSession) {
>         Encoder<SegmentStub> encoder = Encoders.bean(SegmentStub.class);
>         final StructType schema = encoder.schema();
>         sparkSession.udf().register(UDF_NAME,
>                 (UDF2<String, String, SegmentStub>) (s, s2) -> new SegmentStub(1, Date.valueOf(LocalDate.now()), 2),
>                 schema
>         );
>         return UDF_NAME;
>     }
> }
> {code}
> Test code:
> {code:java}
>         List<String[]> strings = Arrays.asList(new String[]{"one", "two"},new String[]{"3", "4"});
>         JavaRDD<Row> rowJavaRDD = sparkContext.parallelize(strings).map(RowFactory::create);
>         StructType schema = DataTypes
>                 .createStructType(new StructField[] { DataTypes.createStructField("foe1", DataTypes.StringType, false),
>                         DataTypes.createStructField("foe2", DataTypes.StringType, false) });
>         Dataset<Row> dataFrame = sparkSession.sqlContext().createDataFrame(rowJavaRDD, schema);
>         Seq<Column> columnSeq = new Set.Set2<>(col("foe1"), col("foe2")).toSeq();
>         dataFrame.select(callUDF(udfName, columnSeq)).show();
> {code}
>  throws exception: 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: The value (SegmentStub(id=1, statusDateTime=2019-09-06, healthPointRatio=2)) of the type (udf.SegmentStub) cannot be converted to struct<healthPointRatio:int,id:int,statusDateTime:date>
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:262)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:238)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
> 	at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:396)
> 	... 21 more
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org