You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dominik Hübner <co...@dhuebner.com> on 2015/07/03 08:44:55 UTC

Kryo fails to serialise output

I have a rather simple avro schema to serialize Tweets (message, username, timestamp).
Kryo and twitter chill are used to do so.

For my dev environment the Spark context is configured as below

val conf: SparkConf = new SparkConf()
conf.setAppName("kryo_test")
conf.setMaster(“local[4]")
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "co.feeb.TweetRegistrator”)

Serialization is setup with

override def registerClasses(kryo: Kryo): Unit = {
    kryo.register(classOf[Tweet], AvroSerializer.SpecificRecordBinarySerializer[Tweet])
}

(This method gets called)


Using this configuration to persist some object fails with java.io.NotSerializableException: co.feeb.avro.Tweet 
(which seems to be ok as this class is not Serializable)

I used the following code:

val ctx: SparkContext = new SparkContext(conf)
    val tweets: RDD[Tweet] = ctx.parallelize(List(
        new Tweet("a", "b", 1L),
        new Tweet("c", "d", 2L),
        new Tweet("e", "f", 3L)
      )
    )

tweets.saveAsObjectFile("file:///tmp/spark”)

Using saveAsTextFile works, but persisted files are not binary but JSON

cat /tmp/spark/part-00000
{"username": "a", "text": "b", "timestamp": 1}
{"username": "c", "text": "d", "timestamp": 2}
{"username": "e", "text": "f", "timestamp": 3}

Is this intended behaviour, a configuration issue, avro serialisation not working in local mode or something else?





---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org