You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2016/07/05 15:11:11 UTC

[jira] [Updated] (FLINK-4154) Correction of murmur hash breaks backwards compatibility

     [ https://issues.apache.org/jira/browse/FLINK-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-4154:
---------------------------------
    Priority: Blocker  (was: Critical)

> Correction of murmur hash breaks backwards compatibility
> --------------------------------------------------------
>
>                 Key: FLINK-4154
>                 URL: https://issues.apache.org/jira/browse/FLINK-4154
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.1.0
>            Reporter: Till Rohrmann
>            Priority: Blocker
>             Fix For: 1.1.0
>
>
> The correction of Flink's murmur hash with commit [1], breaks Flink's backwards compatibility with respect to savepoints. The reason is that the changed murmur hash which is used to partition elements in a {{KeyedStream}} changes the mapping from keys to sub tasks. This changes the assigned key spaces for a sub task. Consequently, an old savepoint (version 1.0) assigns states with a different key space to the sub tasks.
> I think that this must be fixed for the upcoming 1.1 release. I see two options to solve the problem:
> -  revert the changes, but then we don't know how the flawed murmur hash performs
> - develop tooling to repartition state of old savepoints. This is probably not trivial since a keyed stream can also contain non-partitioned state which is not partitionable in all cases. And even if only partitioned state is used, we would need some kind of special operator which can repartition the state wrt the key.
> I think that the latter option requires some more thoughts and is thus unlikely to be done before the release 1.1. Therefore, as a workaround, I think that we should revert the murmur hash changes.
> [1] https://github.com/apache/flink/commit/641a0d436c9b7a34ff33ceb370cf29962cac4dee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)