You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by Xazax-hun <gi...@git.apache.org> on 2016/02/21 23:23:35 UTC

[GitHub] flink pull request: [WIP][FLINK-3422][streaming][api-breaking] Scr...

GitHub user Xazax-hun opened a pull request:

    https://github.com/apache/flink/pull/1685

    [WIP][FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes.

    This pull request contains a fix for FLINK-3422. Some of the tests are failing at the moment, because they utilized prior knowledge about the user hash function. Fixing those tests require knowledge about the internals of Flink that I do not possess yet, so Marton Balassi helps me.
    
    The Jira ticket mentions both Murmur and Jenkins hash.
    Murmur hash is already used in the batch implementation: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/shipping/OutputEmitter.java#L187
    
    My approach was to move Jenkins hash from CompactingHashTable to MathUtils and use that in HashPartitioner. In case you think it is better to use murmur hash here, or it has some value to be consistent in this regard with the batch implementation, please let me know. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Xazax-hun/flink HashPartitioner

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1685
    
----
commit afaa069483423e0bbb448f773cdcb4992689745e
Author: Gabor Horvath <xa...@gmail.com>
Date:   2016-02-21T13:54:44Z

    [FLINK-3422][streaming][api-breaking] Scramble HashPartitioner hashes.

commit 102053618e11e0de784d4d02152dc439a1e274ca
Author: Márton Balassi <mb...@apache.org>
Date:   2016-02-21T22:01:00Z

    [WIP][FLINK-3422][streaming][api-breaking] Update tests reliant on hashing

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3422][streaming][api-breaking] Scramble...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/1685


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3422][streaming][api-breaking] Scramble...

Posted by mbalassi <gi...@git.apache.org>.
Github user mbalassi commented on the pull request:

    https://github.com/apache/flink/pull/1685#issuecomment-190584934
  
    If no objections will merge this tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-3422][streaming][api-breaking] Scramble...

Posted by Xazax-hun <gi...@git.apache.org>.
Github user Xazax-hun commented on the pull request:

    https://github.com/apache/flink/pull/1685#issuecomment-189944863
  
    I think this change is done and ready to be considered for the merge. I think it should be merged to both the master and the release-1.0 branch.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [WIP][FLINK-3422][streaming][api-breaking] Scr...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1685#issuecomment-186948795
  
    It is pretty crucial that different hash functions are used for the partitioning across machines, and the internal partitioning of data structures. If the same hash function is used for both, many internal data structure partitions will be empty.
    
    So far we divided it the following way (admittedly not documented)
      - murmur hash across machines
      - Jenkins hash internally in data structures
    
    How about we stick with that division and use Murmur Hash in the streaming partitioner as well?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [WIP][FLINK-3422][streaming][api-breaking] Scr...

Posted by Xazax-hun <gi...@git.apache.org>.
Github user Xazax-hun commented on the pull request:

    https://github.com/apache/flink/pull/1685#issuecomment-187076733
  
    Thank you for your insight! I think you are right.
    I will move the murmur hash to MathUtils as well, and document that which hash should be used to which purpose. And I will migrate the changes on the streaming API to use murmur instead of jenkins. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---