You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Dave Knoester (JIRA)" <ji...@apache.org> on 2017/04/28 17:08:04 UTC

[jira] [Created] (ZEPPELIN-2474) ClassCast exception when interpreting UDFs from a String

Dave Knoester created ZEPPELIN-2474:
---------------------------------------

             Summary: ClassCast exception when interpreting UDFs from a String
                 Key: ZEPPELIN-2474
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2474
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
         Environment: OS X 10.11.6, spark-2.1.0-bin-hadoop2.7, Scala version 2.11.8 (bundled w/ Spark), Java 1.8.0_121
            Reporter: Dave Knoester
            Priority: Blocker


Hi Zeppelin team,

I'm cross-posting this issue: https://issues.apache.org/jira/browse/SPARK-20525 here in the hopes that someone here can help, since Zeppelin has already solved it.

I'm trying to interpret a string containing Scala code from inside a Spark session. Everything is working fine, except for User Defined Function-like things (UDFs, map, flatMap, etc).

For example, this code works in Zeppelin:

        import org.apache.spark.sql._
        import org.apache.spark.sql.functions._
        import spark.implicits._

        val upper: String => String = _.toUpperCase
        val upperUDF = udf(upper)
        val df = spark.sparkContext.parallelize(Seq("foo","bar")).toDF.withColumn("UPPER", upperUDF($"value"))
        df.show()

However, this code fails when run in a spark-shell:

import scala.tools.nsc.GenericRunnerSettings
import scala.tools.nsc.interpreter.IMain
val settings = new GenericRunnerSettings( println _ )
settings.usejavacp.value = true
val interpreter = new IMain(settings, new java.io.PrintWriter(System.out))
interpreter.bind("spark", spark);
interpreter.interpret("import org.apache.spark.sql.functions.\nimport spark.implicits.\nval upper: String => String = _.toUpperCase\nval upperUDF = udf(upper)\nspark.sparkContext.parallelize(Seq(\"foo\",\"bar\")).toDF.withColumn(\"UPPER\", upperUDF($\"value\")).show")

Exception:

Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2237)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Any help is appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)