You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by anny9699 <an...@gmail.com> on 2014/03/28 17:37:09 UTC

Do all classes involving RDD operation need to be registered?

Hi,

I am sorry if this has been asked before. I found that if I wrapped up some
methods in a class with parameters, spark will throw "Task Nonserializable"
exception; however if wrapped up in an object or case class without
parameters, it will work fine. Is it true that all classes involving RDD
operation should be registered so that SparkContext could recognize them?

Thanks a lot!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Do all classes involving RDD operation need to be registered?

Posted by anny9699 <an...@gmail.com>.
Thanks so much Sonal! I am much clearer now.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3472.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Do all classes involving RDD operation need to be registered?

Posted by Sonal Goyal <so...@gmail.com>.
>From my limited knowledge, all classes involved with the RDD operations
should be extending Serializable if you want Java serialization(default).

However, if you want Kryo serialization, you can
use conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
If you also want to perform custom serialization, as in you want some
variables to be set diferently/computed etc while deserialization, you
would create a custom registrator, register your classes with it and
call conf.set("spark.kryo.registrator","mypkg.MyKryoRegistrator");

If I am missing something, please feel free to correct me.

Best Regards,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Sat, Mar 29, 2014 at 1:40 AM, anny9699 <an...@gmail.com> wrote:

> Thanks a lot Ognen!
>
> It's not a fancy class that I wrote, and now I realized I neither extends
> Serializable or register with Kyro and that's why it is not working.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3446.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Do all classes involving RDD operation need to be registered?

Posted by anny9699 <an...@gmail.com>.
Thanks a lot Ognen!

It's not a fancy class that I wrote, and now I realized I neither extends
Serializable or register with Kyro and that's why it is not working.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439p3446.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Do all classes involving RDD operation need to be registered?

Posted by Ognen Duzlevski <og...@plainvanillagames.com>.
There is also this quote from the Tuning guide 
(http://spark.incubator.apache.org/docs/latest/tuning.html):
" Finally, if you don't register your classes, Kryo will still work, but 
it will have to store the full class name with each object, which is 
wasteful."

It implies that you don't really have to register your classes with 
Kryo. However, what kind of waste are we talking about? :)
Ognen

On 3/28/14, 12:10 PM, Debasish Das wrote:
>
> Classes are serialized and sent to all the workers as akka msgs....
>
> singletons and case classes I am not sure if they are javaserialized 
> or kryoserialized by default....
>
> But definitely your own classes if serialized by kryo will be much 
> efficient.....there is an comparison that Matei did for all the 
> serialization options and kryo was fastest at that time....
>
> Hi,
>
> I am sorry if this has been asked before. I found that if I wrapped up 
> some
> methods in a class with parameters, spark will throw "Task 
> Nonserializable"
> exception; however if wrapped up in an object or case class without
> parameters, it will work fine. Is it true that all classes involving RDD
> operation should be registered so that SparkContext could recognize them?
>
> Thanks a lot!
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Do all classes involving RDD operation need to be registered?

Posted by Debasish Das <de...@gmail.com>.
Classes are serialized and sent to all the workers as akka msgs....

singletons and case classes I am not sure if they are javaserialized or
kryoserialized by default....

But definitely your own classes if serialized by kryo will be much
efficient.....there is an comparison that Matei did for all the
serialization options and kryo was fastest at that time....
Hi,

I am sorry if this has been asked before. I found that if I wrapped up some
methods in a class with parameters, spark will throw "Task Nonserializable"
exception; however if wrapped up in an object or case class without
parameters, it will work fine. Is it true that all classes involving RDD
operation should be registered so that SparkContext could recognize them?

Thanks a lot!



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Do-all-classes-involving-RDD-operation-need-to-be-registered-tp3439.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.