You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Bart Kastermans <fl...@kasterma.net> on 2018/02/19 14:36:57 UTC

Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

In https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/cli.html it is shown that
for gracefully stopping a job you need to implement the StoppableFunction interface.  This
appears not (yet) implemented for Kafka consumers.  Am I missing something, or is there a
different way to gracefully stop a job using a kafka source so we can restart it later without
losing any (in flight) events?

- bart

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Till Rohrmann <tr...@apache.org>.
@Bart, I think there is no Flip yet for the proper stop with savepoint
implementation. My gut feeling is that the community will soon address this
problem since it's a heavily requested feature.

Cheers,
Till

On Wed, Feb 21, 2018 at 10:26 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Christophe,
>
> yes I think you misunderstood the thread. Cancel with savepoint will never
> cause any data loss. The only problem which might arise if you have an
> operator which writes data to an external system immediately, then you
> might see some data in the external system which originates from after the
> savepoint. By implementing the interaction with the external system, for
> example only flush on notify checkpoint complete, you can solve this
> problem. The bottom line is that if you don't do it like this, then you
> might see some duplicate data. The Kafka exactly once sink, for example, is
> implemented such that it takes care of this problem and gives you exactly
> once guarantees.
>
> Cheers,
> Till
>
> On Tue, Feb 20, 2018 at 11:51 PM, Christophe Jolif <cj...@gmail.com>
> wrote:
>
>> Hmm, I did not realize that.
>>
>> I was planning when upgrading a job (consuming from Kafka) to cancel it
>> with a savepoint and then start it back from the savedpoint. But this
>> savedpoint thing was giving me the apparently false feeling I would not
>> lose anything? My understanding was that maybe I would process some events
>> twice in this case but certainly not miss events entirely.
>>
>> Did I misunderstand this thread?
>>
>> If not this sounds like pretty annoying? Do people have some sort of
>> workaround for that?
>>
>> Thanks,
>> --
>> Christophe
>>
>>
>>
>> On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Bart,
>>>
>>> you're right that Flink currently does not support a graceful stop
>>> mechanism for the Kafka source. The community has already a good idea how
>>> to solve it in the general case and will hopefully soon add it to Flink.
>>>
>>> Concerning the StoppableFunction: This interface was introduced quite
>>> some time ago and currently only works for some batch sources. In order to
>>> make it work with streaming, we need to add some more functionality to the
>>> engine in order to properly stop and take a savepoint.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <fl...@kasterma.net>
>>> wrote:
>>>
>>>> In https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>>>> ops/cli.html it is shown that
>>>> for gracefully stopping a job you need to implement the
>>>> StoppableFunction interface.  This
>>>> appears not (yet) implemented for Kafka consumers.  Am I missing
>>>> something, or is there a
>>>> different way to gracefully stop a job using a kafka source so we can
>>>> restart it later without
>>>> losing any (in flight) events?
>>>>
>>>> - bart
>>>>
>>>
>>>
>>
>>
>

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Christophe Jolif <cj...@gmail.com>.
Ok. Thanks a lot of the clarification. That was my initial understanding
but then was confused by the "losing in-flight events" wording.

On Wed, Feb 21, 2018 at 10:26 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Christophe,
>
> yes I think you misunderstood the thread. Cancel with savepoint will never
> cause any data loss. The only problem which might arise if you have an
> operator which writes data to an external system immediately, then you
> might see some data in the external system which originates from after the
> savepoint. By implementing the interaction with the external system, for
> example only flush on notify checkpoint complete, you can solve this
> problem. The bottom line is that if you don't do it like this, then you
> might see some duplicate data. The Kafka exactly once sink, for example, is
> implemented such that it takes care of this problem and gives you exactly
> once guarantees.
>
> Cheers,
> Till
>
> On Tue, Feb 20, 2018 at 11:51 PM, Christophe Jolif <cj...@gmail.com>
> wrote:
>
>> Hmm, I did not realize that.
>>
>> I was planning when upgrading a job (consuming from Kafka) to cancel it
>> with a savepoint and then start it back from the savedpoint. But this
>> savedpoint thing was giving me the apparently false feeling I would not
>> lose anything? My understanding was that maybe I would process some events
>> twice in this case but certainly not miss events entirely.
>>
>> Did I misunderstand this thread?
>>
>> If not this sounds like pretty annoying? Do people have some sort of
>> workaround for that?
>>
>> Thanks,
>> --
>> Christophe
>>
>>
>>
>> On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>>> Hi Bart,
>>>
>>> you're right that Flink currently does not support a graceful stop
>>> mechanism for the Kafka source. The community has already a good idea how
>>> to solve it in the general case and will hopefully soon add it to Flink.
>>>
>>> Concerning the StoppableFunction: This interface was introduced quite
>>> some time ago and currently only works for some batch sources. In order to
>>> make it work with streaming, we need to add some more functionality to the
>>> engine in order to properly stop and take a savepoint.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <fl...@kasterma.net>
>>> wrote:
>>>
>>>> In https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>>>> ops/cli.html it is shown that
>>>> for gracefully stopping a job you need to implement the
>>>> StoppableFunction interface.  This
>>>> appears not (yet) implemented for Kafka consumers.  Am I missing
>>>> something, or is there a
>>>> different way to gracefully stop a job using a kafka source so we can
>>>> restart it later without
>>>> losing any (in flight) events?
>>>>
>>>> - bart
>>>>
>>>
>>>
>>
>>
>


-- 
Christophe

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Till Rohrmann <tr...@apache.org>.
Hi Christophe,

yes I think you misunderstood the thread. Cancel with savepoint will never
cause any data loss. The only problem which might arise if you have an
operator which writes data to an external system immediately, then you
might see some data in the external system which originates from after the
savepoint. By implementing the interaction with the external system, for
example only flush on notify checkpoint complete, you can solve this
problem. The bottom line is that if you don't do it like this, then you
might see some duplicate data. The Kafka exactly once sink, for example, is
implemented such that it takes care of this problem and gives you exactly
once guarantees.

Cheers,
Till

On Tue, Feb 20, 2018 at 11:51 PM, Christophe Jolif <cj...@gmail.com> wrote:

> Hmm, I did not realize that.
>
> I was planning when upgrading a job (consuming from Kafka) to cancel it
> with a savepoint and then start it back from the savedpoint. But this
> savedpoint thing was giving me the apparently false feeling I would not
> lose anything? My understanding was that maybe I would process some events
> twice in this case but certainly not miss events entirely.
>
> Did I misunderstand this thread?
>
> If not this sounds like pretty annoying? Do people have some sort of
> workaround for that?
>
> Thanks,
> --
> Christophe
>
>
>
> On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Bart,
>>
>> you're right that Flink currently does not support a graceful stop
>> mechanism for the Kafka source. The community has already a good idea how
>> to solve it in the general case and will hopefully soon add it to Flink.
>>
>> Concerning the StoppableFunction: This interface was introduced quite
>> some time ago and currently only works for some batch sources. In order to
>> make it work with streaming, we need to add some more functionality to the
>> engine in order to properly stop and take a savepoint.
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <fl...@kasterma.net>
>> wrote:
>>
>>> In https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>>> ops/cli.html it is shown that
>>> for gracefully stopping a job you need to implement the
>>> StoppableFunction interface.  This
>>> appears not (yet) implemented for Kafka consumers.  Am I missing
>>> something, or is there a
>>> different way to gracefully stop a job using a kafka source so we can
>>> restart it later without
>>> losing any (in flight) events?
>>>
>>> - bart
>>>
>>
>>
>
>

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Christophe Jolif <cj...@gmail.com>.
Hmm, I did not realize that.

I was planning when upgrading a job (consuming from Kafka) to cancel it
with a savepoint and then start it back from the savedpoint. But this
savedpoint thing was giving me the apparently false feeling I would not
lose anything? My understanding was that maybe I would process some events
twice in this case but certainly not miss events entirely.

Did I misunderstand this thread?

If not this sounds like pretty annoying? Do people have some sort of
workaround for that?

Thanks,
--
Christophe



On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <tr...@apache.org> wrote:

> Hi Bart,
>
> you're right that Flink currently does not support a graceful stop
> mechanism for the Kafka source. The community has already a good idea how
> to solve it in the general case and will hopefully soon add it to Flink.
>
> Concerning the StoppableFunction: This interface was introduced quite some
> time ago and currently only works for some batch sources. In order to make
> it work with streaming, we need to add some more functionality to the
> engine in order to properly stop and take a savepoint.
>
> Cheers,
> Till
>
> On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <fl...@kasterma.net>
> wrote:
>
>> In https://ci.apache.org/projects/flink/flink-docs-release-1.4/
>> ops/cli.html it is shown that
>> for gracefully stopping a job you need to implement the StoppableFunction
>> interface.  This
>> appears not (yet) implemented for Kafka consumers.  Am I missing
>> something, or is there a
>> different way to gracefully stop a job using a kafka source so we can
>> restart it later without
>> losing any (in flight) events?
>>
>> - bart
>>
>
>

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Bart Kastermans <fl...@kasterma.net>.
Thanks for the reply; is there a flip for this?

- bart

On Mon, Feb 19, 2018, at 5:50 PM, Till Rohrmann wrote:
> Hi Bart,
> 
> you're right that Flink currently does not support a graceful stop
> mechanism for the Kafka source. The community has already a good
> idea how to solve it in the general case and will hopefully soon add
> it to Flink.> 
> Concerning the StoppableFunction: This interface was introduced quite
> some time ago and currently only works for some batch sources. In
> order to make it work with streaming, we need to add some more
> functionality to the engine in order to properly stop and take a
> savepoint.> 
> Cheers,
> Till
> 
> On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans
> <fl...@kasterma.net> wrote:>> In
>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/cli.html
>> it is shown that>>  for gracefully stopping a job you need to implement the
>>  StoppableFunction interface.  This>>  appears not (yet) implemented for Kafka consumers.  Am I missing
>>  something, or is there a>>  different way to gracefully stop a job using a kafka source so we
>>  can restart it later without>>  losing any (in flight) events?
>> 
>>  - bart


Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

Posted by Till Rohrmann <tr...@apache.org>.
Hi Bart,

you're right that Flink currently does not support a graceful stop
mechanism for the Kafka source. The community has already a good idea how
to solve it in the general case and will hopefully soon add it to Flink.

Concerning the StoppableFunction: This interface was introduced quite some
time ago and currently only works for some batch sources. In order to make
it work with streaming, we need to add some more functionality to the
engine in order to properly stop and take a savepoint.

Cheers,
Till

On Mon, Feb 19, 2018 at 3:36 PM, Bart Kastermans <fl...@kasterma.net> wrote:

> In https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/ops/cli.html it is shown that
> for gracefully stopping a job you need to implement the StoppableFunction
> interface.  This
> appears not (yet) implemented for Kafka consumers.  Am I missing
> something, or is there a
> different way to gracefully stop a job using a kafka source so we can
> restart it later without
> losing any (in flight) events?
>
> - bart
>