You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2014/07/17 19:14:55 UTC

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/incubator-flink/pull/74

    [FLINK-610] Added KryoSerializer

    I added the KryoSerializer and replaced the AvroSerializer as the standard generic type serializer. Due to the way Flink's serialization works, we cannot exploit Kryo's Inputs which buffer data for serialization. Instead, we have to read byte after byte if we do not know the length of the underlying data. This has consequences for variable length encodings. Kryo will fall back to a slow deserialization of these types, since it will read only one byte at a time. I do not know the implications of this on Kryo's performance. Therefore, I did not remove Avro completely yet. Furthermore, Kryo should be tested on the cluster and if possible its performance should be compared to Avro.
    
    We could mitigate the problem if we don't use variable length encodings for types such as ```int```s and ```long```s. This, however, comes at the price of larger serialized objects. For UTF8 strings, the problem would  persist.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/incubator-flink FLINK-610

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/74.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #74
    
----
commit 0c618c20bc7be64de82e0e7d3f7734e8c68593bb
Author: Till Rohrmann <ti...@gmail.com>
Date:   2014-07-17T15:21:59Z

    Added KryoSerializer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-49416463
  
    The underlying problem is that Flink can use different serializers for serialized objects in one stream. Thus, it is not possible to use the Kryo serializer all the time. If that weren't the case, it would not matter whether Kryo already read some data into its buffer belonging to the next object.
    
    If we have a memory abstraction instead of a stream, as proposed in [FLINK-987](https://issues.apache.org/jira/browse/FLINK-987), which allows to seek the current pointer, we should be able to virtually write data back which has been read from a different object. However, that would require that we can seek backwards over memory segment borders.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-52780715
  
    No, you were right. It caught me by surprise because most of the Java/ Scala code I have been working with use spaces for indentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-52753332
  
    We agreed to use single tab spaces for indentation. Due to that the code on Github does not look so compactly. Did I mix it up somewhere in the code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-52734136
  
    Do you use tab in the patch source or is it 4 spaces?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-49411918
  
    Thanks for doing this so fast.
    
    Do you know whether the problems you mentioned will be resolved by [FLINK-987](https://issues.apache.org/jira/browse/FLINK-987)? In other words, are the problems only temporary until FLINK-987 is done?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the pull request:

    https://github.com/apache/incubator-flink/pull/74#issuecomment-49417330
  
    I might be able to provide that as part of [FLINK-987](https://issues.apache.org/jira/browse/FLINK-987).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Added KryoSerializer

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-flink/pull/74


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---