You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/07/04 05:34:00 UTC

[jira] [Resolved] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion

     [ https://issues.apache.org/jira/browse/SPARK-21277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21277.
----------------------------------
    Resolution: Not A Bug

> Spark is invoking an incorrect serializer after UDAF completion
> ---------------------------------------------------------------
>
>                 Key: SPARK-21277
>                 URL: https://issues.apache.org/jira/browse/SPARK-21277
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Erik Erlandson
>
> I'm writing a UDAF that also requires some custom UDT implementations.  The UDAF (and UDT) logic appear to be executing properly up through the final UDAF call to the {{evaluate}} method. However, after the evaluate method completes, I am seeing the UDT {{deserialize}} method being called another time, however this time it is being invoked on data that wasn't produced by my corresponding {{serialize}} method, and it is crashing.  The following REPL output shows the execution and completion of {{evaluate}}, and then another call to {{deserialize}} that sees some kind of {{UnsafeArrayData}} object that my serialization doesn't produce, and so the method fails:
> {code}entering evaluate
> a= [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]]
> leaving evaluate
> a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513
> java.lang.RuntimeException: Error while decoding: java.lang.UnsupportedOperationException: Not supported on UnsafeArrayData.
> createexternalrow(newInstance(class org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize, StructField(tdigestmlvecudaf(features),TDigestArrayUDT,true))
> {code}
> To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}:
> https://github.com/erikerlandson/isarn-sketches-spark/tree/first-cut
> Then invoke {{xsbt console}} to get a REPL with a spark session.  In the REPL execute:
> {code}
> Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
> Type in expressions for evaluation. Or try :help.
> scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, 0.1)),(0.0, Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, 1.0)),(1.0, Vectors.dense(0.0, 1.2, -0.5)))).toDF("label", "features")
> training: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val featTD = training.agg(TDigestMLVecUDAF(0.5,10)(training("features")))
> featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): tdigestarray]
> scala> featTD.first
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org