You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/26 18:18:12 UTC

[jira] [Commented] (FLINK-2220) Join on Pojo without hashCode() silently fails

    [ https://issues.apache.org/jira/browse/FLINK-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258358#comment-15258358 ] 

ASF GitHub Bot commented on FLINK-2220:
---------------------------------------

GitHub user gallenvara opened a pull request:

    https://github.com/apache/flink/pull/1940

    [FLINK-2220] Join on Pojo without hashCode() silently fails

    Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration.
    If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html).
    In addition to going through the list, please provide a meaningful description of your changes.
    
    - [X] General
      - The pull request references the related JIRA issue
      - The pull request addresses only one issue
      - Each commit in the PR has a meaningful commit message
    
    Add a check to verify the POJO has overridden the `hashCode()` and `equals()` where it used as a key for operations(join,coGroup,etc).
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gallenvara/flink flink-2220

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1940.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1940
    
----
commit 2f8bfe59540831f3e2e9b181f3a51f0565693cb2
Author: gallenvara <ga...@126.com>
Date:   2016-04-26T16:08:27Z

    Check hashcode and equal method overridden in which POJO used as key.

----


> Join on Pojo without hashCode() silently fails
> ----------------------------------------------
>
>                 Key: FLINK-2220
>                 URL: https://issues.apache.org/jira/browse/FLINK-2220
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 0.9, 0.8.1
>            Reporter: Marcus Leich
>
> I need to perform a join using a complete Pojo as join key.
> With DOP > 1 this only works if the Pojo comes with a meaningful hasCode() implementation, as otherwise equal objects will get hashed to different partitions based on their memory address and not on the content.
> I guess it's fine if users are required to implement hasCode() themselves, but it would be nice of documentation or better yet, Flink itself could alert users that this is a requirement, similar to how Comparable is required for keys.
> Use the following code to reproduce the issue:
> public class Pojo implements Comparable<Pojo> {
>         public byte[] data;
>         public Pojo () {
>         }
>         public Pojo (byte[] data) {
>             this.data = data;
>         }
>         @Override
>         public int compareTo(Pojo o) {
>             return UnsignedBytes.lexicographicalComparator().compare(data, o.data);
>         }
>         // uncomment me for making the join work
>         /* @Override
>         public int hashCode() {
>             return Arrays.hashCode(data);
>         }*/
>     }
>     public void testJoin () throws Exception {
>         final ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();
>         env.setParallelism(4);
>         DataSet<Tuple2<Pojo, String>> left = env.fromElements(
>                 new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "black"),
>                 new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "red"),
>                 new Tuple2<>(new Pojo(new byte[] {1}), "Spark"),
>                 new Tuple2<>(new Pojo(new byte[] {2}), "good"),
>                 new Tuple2<>(new Pojo(new byte[] {5}), "bug"));
>         DataSet<Tuple2<Pojo, String>> right = env.fromElements(
>                 new Tuple2<>(new Pojo(new byte[] {0, 24, 23, 1, 3}), "white"),
>                 new Tuple2<>(new Pojo(new byte[] {0, 14, 13, 14, 13}), "green"),
>                 new Tuple2<>(new Pojo(new byte[] {1}), "Flink"),
>                 new Tuple2<>(new Pojo(new byte[] {2}), "evil"),
>                 new Tuple2<>(new Pojo(new byte[] {5}), "fix"));
>         // will not print anything unless Pojo has a real hashCode() implementation
>         left.join(right).where(0).equalTo(0).projectFirst(1).projectSecond(1).print();
>     }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)