You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Александр Сергеенко <al...@gmail.com> on 2020/07/23 10:07:56 UTC

Flink state reconciliation

Hi,

We use so-called "control stream" pattern to deliver settings to the Flink
job using Apache Kafka topics. The settings are in fact an unlimited stream
of events originating from the master DBMS, which acts as a single point of
truth concerning the rules list.

It may seems odd, since Flink guarantees the "exactly once" delivery
semantics, while a service, which provides the rules publishing mechanism
to Kafka is written using Akka Streams and guarantees the "at least once"
semantics while the rule handling inside Flink Job implemented in an
idempotent manner, but: we have to manage some cases when we need to
execute a reconciliation between the current rules stored at the master
DBMS and the existing Flink State.

We've looked at the Flink's tooling and found out that the State Processor
API can possibly solve our problem, so we basically have to implement a
periodical process, which unloads the State to some external file (.csv)
and then execute a comparison between the set and the information given at
the master system.
Basically it looks like the lambda architecture approach while Flink is
supposed to implement the kappa architecture and in that case our
reconciliation problem looks a bit far-fetched.

Are there any best practices or some patterns addressing such scenarios in
Flink?

Great thanks for any possible assistance and ideas.

-----
Alex Sergeenko

Re: Flink state reconciliation

Posted by Seth Wiesman <sj...@gmail.com>.
That is doable via the state processor API, though Arvid's idea does sound
simpler :)

You could read the operator with the rules, change the data as necessary
and then rewrite it out as a new savepoint to start the job.


On Thu, Jul 30, 2020 at 5:24 AM Arvid Heise <ar...@ververica.com> wrote:

> Another idea: since your handling on Flink is idempotent, would it make
> sense to also periodically send the whole rule set anew?
>
> Going further, depending on the number of rules, their size, and the
> update frequency. Would it be possible to always transfer the complete rule
> set and just discard the old state on update (or do the reconsolidation in
> Flink).
>
> On Wed, Jul 29, 2020 at 2:49 PM Александр Сергеенко <
> aleksandr.sergeenko@gmail.com> wrote:
>
>> Hi Kostas
>>
>> Thanks for a possible help!
>>
>> пт, 24 июл. 2020 г., 19:08 Kostas Kloudas <kk...@gmail.com>:
>>
>>> Hi Alex,
>>>
>>> Maybe Seth (cc'ed) may have an opinion on this.
>>>
>>> Cheers,
>>> Kostas
>>>
>>> On Thu, Jul 23, 2020 at 12:08 PM Александр Сергеенко
>>> <al...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We use so-called "control stream" pattern to deliver settings to the
>>> Flink job using Apache Kafka topics. The settings are in fact an unlimited
>>> stream of events originating from the master DBMS, which acts as a single
>>> point of truth concerning the rules list.
>>> >
>>> > It may seems odd, since Flink guarantees the "exactly once" delivery
>>> semantics, while a service, which provides the rules publishing mechanism
>>> to Kafka is written using Akka Streams and guarantees the "at least once"
>>> semantics while the rule handling inside Flink Job implemented in an
>>> idempotent manner, but: we have to manage some cases when we need to
>>> execute a reconciliation between the current rules stored at the master
>>> DBMS and the existing Flink State.
>>> >
>>> > We've looked at the Flink's tooling and found out that the State
>>> Processor API can possibly solve our problem, so we basically have to
>>> implement a periodical process, which unloads the State to some external
>>> file (.csv) and then execute a comparison between the set and the
>>> information given at the master system.
>>> > Basically it looks like the lambda architecture approach while Flink
>>> is supposed to implement the kappa architecture and in that case our
>>> reconciliation problem looks a bit far-fetched.
>>> >
>>> > Are there any best practices or some patterns addressing such
>>> scenarios in Flink?
>>> >
>>> > Great thanks for any possible assistance and ideas.
>>> >
>>> > -----
>>> > Alex Sergeenko
>>> >
>>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Flink state reconciliation

Posted by Arvid Heise <ar...@ververica.com>.
Another idea: since your handling on Flink is idempotent, would it make
sense to also periodically send the whole rule set anew?

Going further, depending on the number of rules, their size, and the update
frequency. Would it be possible to always transfer the complete rule set
and just discard the old state on update (or do the reconsolidation in
Flink).

On Wed, Jul 29, 2020 at 2:49 PM Александр Сергеенко <
aleksandr.sergeenko@gmail.com> wrote:

> Hi Kostas
>
> Thanks for a possible help!
>
> пт, 24 июл. 2020 г., 19:08 Kostas Kloudas <kk...@gmail.com>:
>
>> Hi Alex,
>>
>> Maybe Seth (cc'ed) may have an opinion on this.
>>
>> Cheers,
>> Kostas
>>
>> On Thu, Jul 23, 2020 at 12:08 PM Александр Сергеенко
>> <al...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > We use so-called "control stream" pattern to deliver settings to the
>> Flink job using Apache Kafka topics. The settings are in fact an unlimited
>> stream of events originating from the master DBMS, which acts as a single
>> point of truth concerning the rules list.
>> >
>> > It may seems odd, since Flink guarantees the "exactly once" delivery
>> semantics, while a service, which provides the rules publishing mechanism
>> to Kafka is written using Akka Streams and guarantees the "at least once"
>> semantics while the rule handling inside Flink Job implemented in an
>> idempotent manner, but: we have to manage some cases when we need to
>> execute a reconciliation between the current rules stored at the master
>> DBMS and the existing Flink State.
>> >
>> > We've looked at the Flink's tooling and found out that the State
>> Processor API can possibly solve our problem, so we basically have to
>> implement a periodical process, which unloads the State to some external
>> file (.csv) and then execute a comparison between the set and the
>> information given at the master system.
>> > Basically it looks like the lambda architecture approach while Flink is
>> supposed to implement the kappa architecture and in that case our
>> reconciliation problem looks a bit far-fetched.
>> >
>> > Are there any best practices or some patterns addressing such scenarios
>> in Flink?
>> >
>> > Great thanks for any possible assistance and ideas.
>> >
>> > -----
>> > Alex Sergeenko
>> >
>>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Re: Flink state reconciliation

Posted by Александр Сергеенко <al...@gmail.com>.
Hi Kostas

Thanks for a possible help!

пт, 24 июл. 2020 г., 19:08 Kostas Kloudas <kk...@gmail.com>:

> Hi Alex,
>
> Maybe Seth (cc'ed) may have an opinion on this.
>
> Cheers,
> Kostas
>
> On Thu, Jul 23, 2020 at 12:08 PM Александр Сергеенко
> <al...@gmail.com> wrote:
> >
> > Hi,
> >
> > We use so-called "control stream" pattern to deliver settings to the
> Flink job using Apache Kafka topics. The settings are in fact an unlimited
> stream of events originating from the master DBMS, which acts as a single
> point of truth concerning the rules list.
> >
> > It may seems odd, since Flink guarantees the "exactly once" delivery
> semantics, while a service, which provides the rules publishing mechanism
> to Kafka is written using Akka Streams and guarantees the "at least once"
> semantics while the rule handling inside Flink Job implemented in an
> idempotent manner, but: we have to manage some cases when we need to
> execute a reconciliation between the current rules stored at the master
> DBMS and the existing Flink State.
> >
> > We've looked at the Flink's tooling and found out that the State
> Processor API can possibly solve our problem, so we basically have to
> implement a periodical process, which unloads the State to some external
> file (.csv) and then execute a comparison between the set and the
> information given at the master system.
> > Basically it looks like the lambda architecture approach while Flink is
> supposed to implement the kappa architecture and in that case our
> reconciliation problem looks a bit far-fetched.
> >
> > Are there any best practices or some patterns addressing such scenarios
> in Flink?
> >
> > Great thanks for any possible assistance and ideas.
> >
> > -----
> > Alex Sergeenko
> >
>

Re: Flink state reconciliation

Posted by Kostas Kloudas <kk...@gmail.com>.
Hi Alex,

Maybe Seth (cc'ed) may have an opinion on this.

Cheers,
Kostas

On Thu, Jul 23, 2020 at 12:08 PM Александр Сергеенко
<al...@gmail.com> wrote:
>
> Hi,
>
> We use so-called "control stream" pattern to deliver settings to the Flink job using Apache Kafka topics. The settings are in fact an unlimited stream of events originating from the master DBMS, which acts as a single point of truth concerning the rules list.
>
> It may seems odd, since Flink guarantees the "exactly once" delivery semantics, while a service, which provides the rules publishing mechanism to Kafka is written using Akka Streams and guarantees the "at least once" semantics while the rule handling inside Flink Job implemented in an idempotent manner, but: we have to manage some cases when we need to execute a reconciliation between the current rules stored at the master DBMS and the existing Flink State.
>
> We've looked at the Flink's tooling and found out that the State Processor API can possibly solve our problem, so we basically have to implement a periodical process, which unloads the State to some external file (.csv) and then execute a comparison between the set and the information given at the master system.
> Basically it looks like the lambda architecture approach while Flink is supposed to implement the kappa architecture and in that case our reconciliation problem looks a bit far-fetched.
>
> Are there any best practices or some patterns addressing such scenarios in Flink?
>
> Great thanks for any possible assistance and ideas.
>
> -----
> Alex Sergeenko
>