You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rajkiran Rajkumar <ra...@gmail.com> on 2016/10/06 13:47:23 UTC

Kryo serializer slower than Java serializer for Spark Streaming

Hi,
I am running a Spark Streaming application which reads from a Kinesis
stream and processes data. The application is run on EMR. Recently, we
tried moving from Java's inbuilt serializer to Kryo serializer. To quantify
the performance improvement, I tried pumping 30000 input records to the
application over a period of 5 minutes. Based on the task deserialization
time, I have the following data.
Using Java serializer- Median 3 ms, Mean 8.21 ms
Using Kryo serializer- Median 4 ms, Mean 9.64 ms

Here, we see that Kryo serializer is slower than Java serializer. Looking
for some advice regarding items that I might have missed taking into
account. Please let me know if more information is needed.

Thanks,
Rajkiran

Re: Kryo serializer slower than Java serializer for Spark Streaming

Posted by Rajkiran Rajkumar <ra...@gmail.com>.
Oops, realized that I didn't reply to all. Pasting snippet again.

Hi Sean,
Thanks for the reply. I've done the part of forcing registration of classes
to the kryo serializer. The observation is in that scenario. To give a
sense of the data, they are records which are serialized using thrift and
read from the Kinesis stream. The data itself is deserialized only inside
the rdd.foreach(), so Spark transfers only Array[Byte] which is a common
kryo serialiable type.

Thanks,
Rajkiran

On Thu, Oct 6, 2016 at 7:38 PM, Sean Owen <so...@cloudera.com> wrote:

> It depends a lot on your data. If it's a lot of custom types then Kryo
> doesn't have a lot of advantage, although, you want to make sure to
> register all your classes with kryo (and consider setting the flag that
> requires kryo registration to ensure it) because that can let kryo avoid
> writing a bunch of class names, which Java serialization always would.
>
> On Thu, Oct 6, 2016 at 2:47 PM Rajkiran Rajkumar <ra...@gmail.com>
> wrote:
>
>> Hi,
>> I am running a Spark Streaming application which reads from a Kinesis
>> stream and processes data. The application is run on EMR. Recently, we
>> tried moving from Java's inbuilt serializer to Kryo serializer. To quantify
>> the performance improvement, I tried pumping 30000 input records to the
>> application over a period of 5 minutes. Based on the task deserialization
>> time, I have the following data.
>> Using Java serializer- Median 3 ms, Mean 8.21 ms
>> Using Kryo serializer- Median 4 ms, Mean 9.64 ms
>>
>> Here, we see that Kryo serializer is slower than Java serializer. Looking
>> for some advice regarding items that I might have missed taking into
>> account. Please let me know if more information is needed.
>>
>> Thanks,
>> Rajkiran
>>
>

Re: Kryo serializer slower than Java serializer for Spark Streaming

Posted by Sean Owen <so...@cloudera.com>.
It depends a lot on your data. If it's a lot of custom types then Kryo
doesn't have a lot of advantage, although, you want to make sure to
register all your classes with kryo (and consider setting the flag that
requires kryo registration to ensure it) because that can let kryo avoid
writing a bunch of class names, which Java serialization always would.

On Thu, Oct 6, 2016 at 2:47 PM Rajkiran Rajkumar <ra...@gmail.com>
wrote:

> Hi,
> I am running a Spark Streaming application which reads from a Kinesis
> stream and processes data. The application is run on EMR. Recently, we
> tried moving from Java's inbuilt serializer to Kryo serializer. To quantify
> the performance improvement, I tried pumping 30000 input records to the
> application over a period of 5 minutes. Based on the task deserialization
> time, I have the following data.
> Using Java serializer- Median 3 ms, Mean 8.21 ms
> Using Kryo serializer- Median 4 ms, Mean 9.64 ms
>
> Here, we see that Kryo serializer is slower than Java serializer. Looking
> for some advice regarding items that I might have missed taking into
> account. Please let me know if more information is needed.
>
> Thanks,
> Rajkiran
>