You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mayuresh Kunjir <ma...@gmail.com> on 2013/11/26 01:25:28 UTC

Kryo serialization for shuffles

Hi Spark users,

This has probably been answered before, but I could not locate it. I
understand from the tuning guide that using Kryo serialization for shuffles
improves the performance. I would like to know how to register the Kryo
serializer. Apart from the shuffles, my standalone application needs to
store and retrieve a few object files as well. I would really appreciate
any pointers on registering Kryo serializer for both these serialization
tasks.

Thanks and regards,
~Mayuresh

Re: Kryo serialization for shuffles

Posted by Mayuresh Kunjir <ma...@gmail.com>.
Actually, when I set the property spark.serializer for my standalone
application, I get the following error which led me to believe that java
serializer is still being used by certain tasks. But it was an error in my
configuration. Sorry for that.

Thanks for your help. It's working like a charm now.

~Mayuresh

java.io.FileNotFoundException: http://152.3.144.217:42211/broadcast-6
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1623)
        at java.net.URL.openStream(URL.java:1037)
        at
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:143)
        at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:435)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:435)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:435)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
scala.collection.immutable.$colon$colon.readObject(List.scala:435)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1004)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
        at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
        at
org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:67)
        at
org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:124)
        at
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1835)
        at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)
        at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
        at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
         at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)




-----------------------------
Mayuresh Kunjir
PhD Student (Computer Science)
Duke University
http://www.cs.duke.edu/~mayuresh



On Mon, Nov 25, 2013 at 5:10 PM, Andrew Ash <an...@andrewash.com> wrote:

> Hi Matei, I've clarified the documentation to include this information in
> this pull request.  Can you take a look?
>
> https://github.com/apache/incubator-spark/pull/206
>
>
> On Mon, Nov 25, 2013 at 5:03 PM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> Yeah, if you just say spark.serializer to Kryo, it will use it for all
>> these things.
>>
>> Matei
>>
>> On Nov 25, 2013, at 4:59 PM, Andrew Ash <an...@andrewash.com> wrote:
>>
>> How do you know Spark doesn't also use Kryo for shuffled files?  Are
>> there metrics or logs somewhere that make you believe it's normal Java
>> serialization?
>>
>>
>> On Mon, Nov 25, 2013 at 4:46 PM, Mayuresh Kunjir <
>> mayuresh.kunjir@gmail.com> wrote:
>>
>>> This shows how to serialize user classes. I wanted Spark to serialize
>>> all shuffle files and object files using Kryo. How can I specify that? Or
>>> would that be done by default if I just set spark.serializer to kryo?
>>>
>>>
>>>
>>>
>>> On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia <ma...@gmail.com>wrote:
>>>
>>>> Did you look through
>>>> http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization?It shows an example of how to register classes with Kryo. In particular, in
>>>> your Registrator, you can use kryo.register(yourClass, new YourSerializer)
>>>> to pass a custom serializer too.
>>>>
>>>> Matei
>>>>
>>>> On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Spark users,
>>>>
>>>> This has probably been answered before, but I could not locate it. I
>>>> understand from the tuning guide that using Kryo serialization for shuffles
>>>> improves the performance. I would like to know how to register the Kryo
>>>> serializer. Apart from the shuffles, my standalone application needs to
>>>> store and retrieve a few object files as well. I would really appreciate
>>>> any pointers on registering Kryo serializer for both these serialization
>>>> tasks.
>>>>
>>>> Thanks and regards,
>>>> ~Mayuresh
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Kryo serialization for shuffles

Posted by Andrew Ash <an...@andrewash.com>.
Hi Matei, I've clarified the documentation to include this information in
this pull request.  Can you take a look?

https://github.com/apache/incubator-spark/pull/206


On Mon, Nov 25, 2013 at 5:03 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Yeah, if you just say spark.serializer to Kryo, it will use it for all
> these things.
>
> Matei
>
> On Nov 25, 2013, at 4:59 PM, Andrew Ash <an...@andrewash.com> wrote:
>
> How do you know Spark doesn't also use Kryo for shuffled files?  Are there
> metrics or logs somewhere that make you believe it's normal Java
> serialization?
>
>
> On Mon, Nov 25, 2013 at 4:46 PM, Mayuresh Kunjir <
> mayuresh.kunjir@gmail.com> wrote:
>
>> This shows how to serialize user classes. I wanted Spark to serialize all
>> shuffle files and object files using Kryo. How can I specify that? Or would
>> that be done by default if I just set spark.serializer to kryo?
>>
>>
>>
>>
>> On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia <ma...@gmail.com>wrote:
>>
>>> Did you look through
>>> http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization?It shows an example of how to register classes with Kryo. In particular, in
>>> your Registrator, you can use kryo.register(yourClass, new YourSerializer)
>>> to pass a custom serializer too.
>>>
>>> Matei
>>>
>>> On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com>
>>> wrote:
>>>
>>> Hi Spark users,
>>>
>>> This has probably been answered before, but I could not locate it. I
>>> understand from the tuning guide that using Kryo serialization for shuffles
>>> improves the performance. I would like to know how to register the Kryo
>>> serializer. Apart from the shuffles, my standalone application needs to
>>> store and retrieve a few object files as well. I would really appreciate
>>> any pointers on registering Kryo serializer for both these serialization
>>> tasks.
>>>
>>> Thanks and regards,
>>> ~Mayuresh
>>>
>>>
>>>
>>>
>>
>
>

Re: Kryo serialization for shuffles

Posted by Matei Zaharia <ma...@gmail.com>.
Yeah, if you just say spark.serializer to Kryo, it will use it for all these things.

Matei

On Nov 25, 2013, at 4:59 PM, Andrew Ash <an...@andrewash.com> wrote:

> How do you know Spark doesn't also use Kryo for shuffled files?  Are there metrics or logs somewhere that make you believe it's normal Java serialization?
> 
> 
> On Mon, Nov 25, 2013 at 4:46 PM, Mayuresh Kunjir <ma...@gmail.com> wrote:
> This shows how to serialize user classes. I wanted Spark to serialize all shuffle files and object files using Kryo. How can I specify that? Or would that be done by default if I just set spark.serializer to kryo?
> 
> 
> 
> 
> On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia <ma...@gmail.com> wrote:
> Did you look through http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization? It shows an example of how to register classes with Kryo. In particular, in your Registrator, you can use kryo.register(yourClass, new YourSerializer) to pass a custom serializer too.
> 
> Matei
> 
> On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com> wrote:
> 
>> Hi Spark users,
>> 
>> This has probably been answered before, but I could not locate it. I understand from the tuning guide that using Kryo serialization for shuffles improves the performance. I would like to know how to register the Kryo serializer. Apart from the shuffles, my standalone application needs to store and retrieve a few object files as well. I would really appreciate any pointers on registering Kryo serializer for both these serialization tasks.
>> 
>> Thanks and regards,
>> ~Mayuresh
>> 
>> 
> 
> 
> 


Re: Kryo serialization for shuffles

Posted by Andrew Ash <an...@andrewash.com>.
How do you know Spark doesn't also use Kryo for shuffled files?  Are there
metrics or logs somewhere that make you believe it's normal Java
serialization?


On Mon, Nov 25, 2013 at 4:46 PM, Mayuresh Kunjir
<ma...@gmail.com>wrote:

> This shows how to serialize user classes. I wanted Spark to serialize all
> shuffle files and object files using Kryo. How can I specify that? Or would
> that be done by default if I just set spark.serializer to kryo?
>
>
>
>
> On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia <ma...@gmail.com>wrote:
>
>> Did you look through
>> http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization?It shows an example of how to register classes with Kryo. In particular, in
>> your Registrator, you can use kryo.register(yourClass, new YourSerializer)
>> to pass a custom serializer too.
>>
>> Matei
>>
>> On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com>
>> wrote:
>>
>> Hi Spark users,
>>
>> This has probably been answered before, but I could not locate it. I
>> understand from the tuning guide that using Kryo serialization for shuffles
>> improves the performance. I would like to know how to register the Kryo
>> serializer. Apart from the shuffles, my standalone application needs to
>> store and retrieve a few object files as well. I would really appreciate
>> any pointers on registering Kryo serializer for both these serialization
>> tasks.
>>
>> Thanks and regards,
>> ~Mayuresh
>>
>>
>>
>>
>

Re: Kryo serialization for shuffles

Posted by Mayuresh Kunjir <ma...@gmail.com>.
This shows how to serialize user classes. I wanted Spark to serialize all
shuffle files and object files using Kryo. How can I specify that? Or would
that be done by default if I just set spark.serializer to kryo?




On Mon, Nov 25, 2013 at 7:42 PM, Matei Zaharia <ma...@gmail.com>wrote:

> Did you look through
> http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization?It shows an example of how to register classes with Kryo. In particular, in
> your Registrator, you can use kryo.register(yourClass, new YourSerializer)
> to pass a custom serializer too.
>
> Matei
>
> On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com>
> wrote:
>
> Hi Spark users,
>
> This has probably been answered before, but I could not locate it. I
> understand from the tuning guide that using Kryo serialization for shuffles
> improves the performance. I would like to know how to register the Kryo
> serializer. Apart from the shuffles, my standalone application needs to
> store and retrieve a few object files as well. I would really appreciate
> any pointers on registering Kryo serializer for both these serialization
> tasks.
>
> Thanks and regards,
> ~Mayuresh
>
>
>
>

Re: Kryo serialization for shuffles

Posted by Matei Zaharia <ma...@gmail.com>.
Did you look through http://spark.incubator.apache.org/docs/latest/tuning.html#data-serialization? It shows an example of how to register classes with Kryo. In particular, in your Registrator, you can use kryo.register(yourClass, new YourSerializer) to pass a custom serializer too.

Matei

On Nov 25, 2013, at 4:25 PM, Mayuresh Kunjir <ma...@gmail.com> wrote:

> Hi Spark users,
> 
> This has probably been answered before, but I could not locate it. I understand from the tuning guide that using Kryo serialization for shuffles improves the performance. I would like to know how to register the Kryo serializer. Apart from the shuffles, my standalone application needs to store and retrieve a few object files as well. I would really appreciate any pointers on registering Kryo serializer for both these serialization tasks.
> 
> Thanks and regards,
> ~Mayuresh
> 
>