You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rmetzger <gi...@git.apache.org> on 2014/12/16 21:18:10 UTC

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

GitHub user rmetzger opened a pull request:

    https://github.com/apache/incubator-flink/pull/271

    [FLINK-610] Replace Avro with Kryo as the GenericType serializer

    This pull request is basically a collection of the work of others ;)
    @tillrohrmann contributed some of the tests (in particular for the serializer itself). He did the integration with [Chill](https://github.com/twitter/chill).
    @twalthr contributed one commit for fixing a TypeExtractor bug.
    
    I added two IT cases, checking it to work with the following types:
    
    ```java
    java.util.List<String>
    scala.math.BigInt
    java.io.File
    java.math.BigDecimal;
    java.math.BigInteger;
    java.util.HashMap;
    java.io.Serializable;
    ```
    
    I also tested it on a cluster with a Collection of Objects (containing Strings and Integers). It worked. I didn't test it for performance yet. The purpose of the pull request is to extend the type support for cases we don't cover by our own serialization framework.
    
    I'll now test the change with the code that brought the issue on top of my TODO list: https://issues.apache.org/jira/browse/FLINK-629?focusedCommentId=14241856&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14241856

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rmetzger/incubator-flink flink610-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #271
    
----
commit b9dac6bad47e41fc3bc1dafc3eaede1a3bf63496
Author: twalthr <in...@twalthr.com>
Date:   2014-12-10T21:02:07Z

    Fix invalid type hierarchy creation by Pojo logic

commit 64d2e52970f761ad117aec0080d0618f7f7ace95
Author: Robert Metzger <rm...@apache.org>
Date:   2014-12-16T10:30:52Z

    [FLINK-610] Replace Avro by Kryo as the GenericType serializer
    
    The performance of data-intensive jobs using Kryo is probably going to be slow.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67300275
  
    Very good idea. I think we could fix this still a bit and make it better:
    
      - Avro created a schema from the objects to serialize (somewhat a simple thing as our POJO analysis). That way, it does not need to write all classnames. Kryo writes all classnames by default, which is very painful, unless they are registered. I think that was one of the reasons why Avro was actually faster for us out of the box.
         - We should register the types at Kryo that we see as being (recursively contained in the type information)
         - We should also allow at the execution environment to manually register types or serializers.
    
      - Why is the `copy()` method not using Kryo's copy functionality? The current implementation seems painfully inefficient, and will still be used until @aljoscha 's change about reuse/non-reuse mode is in.
    
      - The tests that have a hardwired check for the KryoSerializer break design...
    
    
    It seems that the Java/Scala code is moving closer together still, with the common serializer being configured in the Java API for Scala classes. That means we might really collapse the projects at some point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67318414
  
    Okay, I'll add that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67318300
  
    How about we try to use Kryo's copy method and fall back to serialization
    copies upon failure?
    Am 17.12.2014 11:59 schrieb "Robert Metzger" <no...@github.com>:
    
    > Thank you for the feedback.
    >
    >    -
    >
    >    We are using the Twitter Chill library. It registers many commonly
    >    used classes with Kryo (Collections, Date, BigInt, ...). So the issue with
    >    class names being written does only apply to userdefined types.
    >    You are indeed right that we should provide facilities to register
    >    custom classes and that we should register all classes we see during type
    >    analysis.
    >    I didn't implement these features because Timo is actually working on
    >    this and I wanted to have at least basic Kryo support in the release. Timo
    >    will integrate Kryo more tightly with the type analysis.
    >    -
    >
    >    The copy() method of Kryo is not implemented for many types (in
    >    particular those by Chill). For example the java.sql.Date type (
    >    https://github.com/twitter/chill/blob/develop/chill-java/src/main/java/com/twitter/chill/java/SqlDateSerializer.java)
    >    doesn't have a copy() method.
    >    Kryo provides a default copy() method which fails on mutable types. (
    >    https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java#L102).
    >    I thought about contributing the missing copy() methods for chill.
    >    We could also whitelist some classes for kryo's copy and fall back to
    >    the slow variant.
    >    -
    >
    >    The hardwired check for the KryoSerializer is very ugly. Maybe I can
    >    come up with a different solution. Till suggested to add a method (
    >    canCreateInstance) to all serializers .. but thats a lot of code for
    >    one test case.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/incubator-flink/pull/271#issuecomment-67307001>
    > .
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67299689
  
    I rebased the pull request to the 0.8 branch and added a fix for https://issues.apache.org/jira/browse/FLINK-1333.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67470974
  
    Manually closing it. Has been merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67341195
  
    That makes sense.
    
    +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/271#issuecomment-67307001
  
    Thank you for the feedback.
    * We are using the Twitter Chill library. It registers many commonly used classes with Kryo (Collections, Date, BigInt, ...). So the issue with class names being written does only apply to userdefined types.
    You are indeed right that we should provide facilities to register custom classes and that we should register all classes we see during type analysis.
    I didn't implement these features because Timo is actually working on this and I wanted to have at least basic Kryo support in the release. Timo will integrate Kryo more tightly with the type analysis.
    
    * The `copy()` method of Kryo is not implemented for many types (in particular those by Chill). For example the java.sql.Date type (https://github.com/twitter/chill/blob/develop/chill-java/src/main/java/com/twitter/chill/java/SqlDateSerializer.java) doesn't have a copy() method. 
    Kryo provides a default copy() method which fails on mutable types. (https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/Serializer.java#L102). I thought about contributing the missing copy() methods for chill.
    We could also whitelist some classes for kryo's copy and fall back to the slow variant.
    
    * The hardwired check for the KryoSerializer is very ugly. Maybe I can come up with a different solution. Till suggested to add a method (`canCreateInstance`) to all serializers .. but thats a lot of code for one test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: [FLINK-610] Replace Avro with Kryo a...

Posted by rmetzger <gi...@git.apache.org>.
Github user rmetzger closed the pull request at:

    https://github.com/apache/incubator-flink/pull/271


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---