You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2016/10/31 17:55:58 UTC
[jira] [Comment Edited] (BEAM-626) AvroCoder not deserializing correctly in Kryo

    [ https://issues.apache.org/jira/browse/BEAM-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622842#comment-15622842 ] 

Amit Sela edited comment on BEAM-626 at 10/31/16 5:54 PM:
----------------------------------------------------------

Those objects rely on Java serialization which is outperformed by Kryo serialization.
This may not be an issue for other frameworks as they serialize tasks (The DoFn and it's content) to the worker because it happens every time a new worker is deployed/utilized but Spark streaming has to do this every batch interval (sometimes as frequent as 500 msec) so it is better to use Kryo. 

So far {{AvroCoder}} is the only issue, and the entire Spark community (which is one of the largest Apache communities) deals with Kryo just fine.

As discussed before, if this becomes a pain I could expose an API to register serializers (falling back to {{JavaSerialization}} is one of them).

What I really don't understand is the stand on this ticket - "AvroCoder not deserializing correctly in Kryo" - as in making the coder work with Kryo..
If the proposed solution will degrade the quality/performance of the current state of the implementation I'm for simply registering with Java but if this is not the case, what's the harm with the proposed Kryo fix ?

From my point of view, enabling the coder to work with Kryo while NOT making it worse in any way, is a good thing.


was (Author: amitsela):
Those objects rely on Java serialization which is outperformed by Kryo serialization.
This may not be an issue for other frameworks as they serialize tasks (The DoFn and it's content) to the worker because it happens every time a new worker is deployed/utilized but Spark streaming has to do this every batch interval (sometimes as frequent as 500 msec) so it is better to use Kryo. 

So far {{AvroCoder}} is the only issue, and the entire Spark community (which is one of the largest Apache communities) deals with Kryo just fine.

As discussed before, if this becomes a pain I could expose an API to register serializers (falling back to {{JavaSerialization}} is one of them).

What I really don't understand is the stand on this ticket - "AvroCoder not deserializing correctly in Kryo" - as in making the coder work with Kryo..
If the proposed solution was to degrade the quality/performance of the current state of the implementation I would be the first to suggest we simply register with Java but this is not the case. First proposal wasn't threadsafe, which is not different from the current state, and adding synchronization might heart performance (regardless of Kryo).

From my point of view, enabling the coder to work with Kryo while NOT making it worse in any way, is a good thing.

> AvroCoder not deserializing correctly in Kryo
> ---------------------------------------------
>
>                 Key: BEAM-626
>                 URL: https://issues.apache.org/jira/browse/BEAM-626
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Aviem Zur
>            Assignee: Aviem Zur
>            Priority: Minor
>
> Unlike with Java serialization, when deserializing AvroCoder using Kryo, the resulting AvroCoder is missing all of its transient fields.
> The reason it works with Java serialization is because of the usage of writeReplace and readResolve, which Kryo does not adhere to.
> In ProtoCoder for example there are also unserializable members, the way it is solved there is lazy initializing these members via their getters, so they are initialized in the deserialized object on first call to the member.
> It seems AvroCoder is the only class in Beam to use writeReplace convention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)