You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by amit tewari <am...@gmail.com> on 2016/01/27 12:13:55 UTC

spark.kryo.classesToRegister

This is what I have added in my code:



rdd.persist(StorageLevel.MEMORY_ONLY_SER())

conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");



Do I compulsorily need to do anything via : spark.kryo.classesToRegister?

Or the above code sufficient to achieve performance gain using Kryo
serialization?



Thanks

Amit

Re: spark.kryo.classesToRegister

Posted by Jagrut Sharma <ja...@gmail.com>.
I have run into this issue (
https://issues.apache.org/jira/browse/SPARK-10251) with kryo on Spark
version 1.4.1.
Just something to be aware of when setting config to 'true'.

Thanks.
--
Jagrut


On Thu, Jan 28, 2016 at 6:32 AM, Jim Lohse <sp...@megalearningllc.com>
wrote:

> You are only required to add classes to Kryo (compulsorily) if you use a
> specific setting:
>
> //require registration of all classes with Kyro.set("spark.kryo.registrationRequired", "true")
>
> Here's an example of my setup, I think this is the best approach because
> it forces me to really think about what I am serializing:
>
> // for kyro serializer it wants to register all classes that need to be serializedClass[] kryoClassArray = new Class[]{DropResult.class, DropEvaluation.class, PrintHetSharing.class};
> SparkConf sparkConf = new SparkConf()
> .setAppName("MyAppName")
> .setMaster(spark://ipaddress:7077)
> // now for the Kryo stuff
> .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")//require registration of all classes with Kyro.set("spark.kryo.registrationRequired", "true")// don't forget to register ALL classes or will get error.registerKryoClasses(kryoClassArray);
>
> On 01/27/2016 12:58 PM, Shixiong(Ryan) Zhu wrote:
>
> It depends. The default Kryo serializer cannot handle all cases. If you
> encounter any issue, you can follow the Kryo doc to set up custom
> serializer: https://github.com/EsotericSoftware/kryo/blob/master/README.md
>
> On Wed, Jan 27, 2016 at 3:13 AM, amit tewari <am...@gmail.com>
> wrote:
>>
>> This is what I have added in my code:
>>
>>
>>
>> rdd.persist(StorageLevel.MEMORY_ONLY_SER())
>>
>> conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
>>
>>
>>
>> Do I compulsorily need to do anything via : spark.kryo.classesToRegister?
>>
>> Or the above code sufficient to achieve performance gain using Kryo
>> serialization?
>>
>>
>>
>> Thanks
>>
>> Amit
>>
>

Re: spark.kryo.classesToRegister

Posted by Jim Lohse <sp...@megalearningllc.com>.
You are only required to add classes to Kryo (compulsorily) if you use a 
specific setting:

//require registration of all classes with Kyro .set("spark.kryo.registrationRequired","true")

Here's an example of my setup, I think this is the best approach because 
it forces me to really think about what I am serializing:

// for kyro serializer it wants to register all classes that need to be 
serialized Class[] kryoClassArray = new Class[]{DropResult.class, 
DropEvaluation.class, PrintHetSharing.class}; SparkConf sparkConf = new 
SparkConf() .setAppName("MyAppName") .setMaster(spark://ipaddress:7077) 
// now for the Kryo stuff .set("spark.serializer", 
"org.apache.spark.serializer.KryoSerializer") //require registration of 
all classes with Kyro .set("spark.kryo.registrationRequired", "true") // 
don't forget to register ALL classes or will get error 
.registerKryoClasses(kryoClassArray);




On 01/27/2016 12:58 PM, Shixiong(Ryan) Zhu wrote:
> It depends. The default Kryo serializer cannot handle all cases. If 
> you encounter any issue, you can follow the Kryo doc to set up custom 
> serializer: 
> https://github.com/EsotericSoftware/kryo/blob/master/README.md
> On Wed, Jan 27, 2016 at 3:13 AM, amit tewari <amittewari.5@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     This is what I have added in my code:
>
>     rdd.persist(StorageLevel.MEMORY_ONLY_SER())
>
>     conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
>
>     Do I compulsorily need to do anything via
>     : spark.kryo.classesToRegister?
>
>     Or the above code sufficient to achieve performance gain using
>     Kryo serialization?
>
>     Thanks
>
>     Amit
>

Re: spark.kryo.classesToRegister

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.
It depends. The default Kryo serializer cannot handle all cases. If you
encounter any issue, you can follow the Kryo doc to set up custom
serializer: https://github.com/EsotericSoftware/kryo/blob/master/README.md

On Wed, Jan 27, 2016 at 3:13 AM, amit tewari <am...@gmail.com> wrote:

> This is what I have added in my code:
>
>
>
> rdd.persist(StorageLevel.MEMORY_ONLY_SER())
>
> conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer");
>
>
>
> Do I compulsorily need to do anything via : spark.kryo.classesToRegister?
>
> Or the above code sufficient to achieve performance gain using Kryo
> serialization?
>
>
>
> Thanks
>
> Amit
>