You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Gyula Fóra <gy...@gmail.com> on 2018/01/08 13:46:23 UTC

Flink 1.3 -> 1.4 Kafka state migration issue

Hi,

Is it possible that the Kafka partition assignment logic has changed
between Flink 1.3  and 1.4? I am trying to migrate some jobs using Kafka
0.8 sources and about half my jobs lost offset state for some partitions
(but not all partitions). Jobs with parallelism 1 dont seem to be
affected...

Any ideas?

Gyula

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Posted by Gyula Fóra <gy...@gmail.com>.
Hi,
Thanks Gordon, should have read the announcement :)

This might indeed be the case here, I will just use the workaround. At
least this is a known issue, almost got a heart attack today :D

Cheers,
Gyula

Tzu-Li (Gordon) Tai <tz...@apache.org> ezt írta (időpont: 2018. jan. 8.,
H, 17:56):

> Hi Gyula,
>
> Is your 1.3 savepoint from Flink 1.3.1 or 1.3.0?
> In those versions, we had a critical bug that caused duplicate partition
> assignments in corner cases, so the assignment logic was altered from 1.3.1
> to 1.3.2 (and therefore also 1.4.0).
>
> If you indeed was using 1.3.1 or 1.3.0, and you are certain that the
> savepoint does not contain duplicate partition assignments caused by the
> bug, then yes restoring with DOP 1 and then rescaling again is a good
> workaround.
>
> Please see the 1.3.2 release announcement [1] for details.
>
> Best,
> Gordon
>
> [1] http://flink.apache.org/news/2017/08/05/release-1.3.2.html
>
>
>
> On Jan 8, 2018 6:57 AM, "Gyula Fóra" <gy...@gmail.com> wrote:
>
> Migrating the jobs by setting the sources to parallelism = 1 and then
> scale back up after migration seems to be a good workaround, but I am
> wondering if something I do made this happen or this is a bug.
>
> Gyula Fóra <gy...@gmail.com> ezt írta (időpont: 2018. jan. 8., H,
> 14:46):
>
>> Hi,
>>
>> Is it possible that the Kafka partition assignment logic has changed
>> between Flink 1.3  and 1.4? I am trying to migrate some jobs using Kafka
>> 0.8 sources and about half my jobs lost offset state for some partitions
>> (but not all partitions). Jobs with parallelism 1 dont seem to be
>> affected...
>>
>> Any ideas?
>>
>> Gyula
>>
>
>

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.
Hi Gyula,

Is your 1.3 savepoint from Flink 1.3.1 or 1.3.0?
In those versions, we had a critical bug that caused duplicate partition
assignments in corner cases, so the assignment logic was altered from 1.3.1
to 1.3.2 (and therefore also 1.4.0).

If you indeed was using 1.3.1 or 1.3.0, and you are certain that the
savepoint does not contain duplicate partition assignments caused by the
bug, then yes restoring with DOP 1 and then rescaling again is a good
workaround.

Please see the 1.3.2 release announcement [1] for details.

Best,
Gordon

[1] http://flink.apache.org/news/2017/08/05/release-1.3.2.html


On Jan 8, 2018 6:57 AM, "Gyula Fóra" <gy...@gmail.com> wrote:

Migrating the jobs by setting the sources to parallelism = 1 and then scale
back up after migration seems to be a good workaround, but I am wondering
if something I do made this happen or this is a bug.

Gyula Fóra <gy...@gmail.com> ezt írta (időpont: 2018. jan. 8., H,
14:46):

> Hi,
>
> Is it possible that the Kafka partition assignment logic has changed
> between Flink 1.3  and 1.4? I am trying to migrate some jobs using Kafka
> 0.8 sources and about half my jobs lost offset state for some partitions
> (but not all partitions). Jobs with parallelism 1 dont seem to be
> affected...
>
> Any ideas?
>
> Gyula
>

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Posted by Gyula Fóra <gy...@gmail.com>.
Migrating the jobs by setting the sources to parallelism = 1 and then scale
back up after migration seems to be a good workaround, but I am wondering
if something I do made this happen or this is a bug.

Gyula Fóra <gy...@gmail.com> ezt írta (időpont: 2018. jan. 8., H,
14:46):

> Hi,
>
> Is it possible that the Kafka partition assignment logic has changed
> between Flink 1.3  and 1.4? I am trying to migrate some jobs using Kafka
> 0.8 sources and about half my jobs lost offset state for some partitions
> (but not all partitions). Jobs with parallelism 1 dont seem to be
> affected...
>
> Any ideas?
>
> Gyula
>