You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Yun Tang <my...@live.com> on 2021/06/02 02:36:31 UTC

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Hi Yuan, thanks for launching this discussion.

I prefer option-3 as this is the easiest to understand for users.


Best
Yun Tang
________________________________
From: Roman Khachatryan <ro...@apache.org>
Sent: Monday, May 31, 2021 16:53
To: dev <de...@flink.apache.org>
Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Hey Yuan, thanks for the proposal

I think Option 3 is the simplest to use and exposes less details than any other.
It's also consistent with the current way of configuring state
backends, as long as we treat change logging as a common feature
applicable to any state backend, like e.g.
state.backend.local-recovery.

Option 6 seems slightly less preferable as it exposes more details but
I think is the most viable alternative.

Regards,
Roman


On Mon, May 31, 2021 at 8:39 AM Yuan Mei <yu...@gmail.com> wrote:
>
> Hey all,
>
> We would like to start a discussion on how to enable/config Changelog
> Statebakcend.
>
> As part of FLIP-158[1], Changelog state backend wraps on top of existing
> state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> expect more) and delegates state changes to the underlying state backends.
> This thread is to discuss the problem of how Changelog StateBackend should
> be enabled and configured.
>
> Proposed options to enable/config state changelog is listed below:
>
> Option 1: Enable Changelog Statebackend through a Boolean Flag
>
> Option 2: Enable Changelog Statebackend through a Boolean Flag + a Special
> Case
>
> Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> ChangelogStateBackend Exposed
>
> Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> inner backend
>
> Option 5: Explicit Nested Configuration + inner state backend configuration
> unchanged
>
> Option 6: Config Changelog and Inner Statebackend All-Together
>
> Details of each option can be found here:
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
>
> When considering these options, please consider these four dimensions:
> 1 Consistency
> API/config should follow a consistent model and should not have
> contradicted logic beneath
> 2 Simplicity
> API should be easy to use and not introduce too much burden on users
> 3. Explicity
> API/config should not contain implicit assumptions and should be intuitive
> to users
> 4. Extensibility
> With foreseen future, whether the current setting can be easily extended
>
> Please let us know what do you think and please keep the discussion in this
> mailing thread.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
>
> Best
> Yuan

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Posted by Yuan Mei <yu...@gmail.com>.
Thank you everyone for replying!

Option 3 wins with dominating # of votes + mine.

This option works as a refined version of the original proposal in
FLIP-158: Generalized incremental checkpoints [1]:
  - Define consistent override and combination policy (flag + state
backend) in different config levels
  - Define explicitly the meaning of "enable flag" = true/false/unset
  - Hide ChangelogStateBackend from users

According to the discussion in this thread, we will go with
Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
ChangelogStateBackend Exposed

 [1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints

Best
Yuan

On Tue, Jun 8, 2021 at 6:40 PM Yu Li <ca...@gmail.com> wrote:

> +1 for option 3.
>
> IMHO persisting (operator's) state data through change log is an
> independent mechanism which could co-work with all kinds of local state
> stores (heap and rocksdb). This mechanism is similar to the WAL
> (write-ahead-log) mechanism in the database system. Although implement-wise
> we're using wrapper (decorator) pattern and naming it as
> `ChangeLogStateBackend`, it's not really another type of state backend. For
> the same reason, ChangeLogStateBackend should be an internal class and not
> exposed to the end user. Users only need to know / control whether to
> enable change log or not, just like whether to enable WAL in the
> traditional database system.
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <pn...@apache.org> wrote:
>
> > Hi,
> >
> > I would actually prefer option 6 (or 5/4), for the sake of configuration
> > being explicit and self explanatory. But at the same time I don't have
> very
> > hard preferences and from the remaining options, option 3 seems the most
> > reasonable.
> >
> > The question would be, do we want to expose to the users that
> > ChangeLogStateBackend is wrapping an inner state backend or not? If not,
> > option 3 is the best. If we do, if we want to teach the users and help
> them
> > build the understanding of how things are working underneath, option 5
> or 6
> > are better.
> >
> > Best,
> > Piotrek
> >
> > śr., 2 cze 2021 o 04:36 Yun Tang <my...@live.com> napisał(a):
> >
> > > Hi Yuan, thanks for launching this discussion.
> > >
> > > I prefer option-3 as this is the easiest to understand for users.
> > >
> > >
> > > Best
> > > Yun Tang
> > > ________________________________
> > > From: Roman Khachatryan <ro...@apache.org>
> > > Sent: Monday, May 31, 2021 16:53
> > > To: dev <de...@flink.apache.org>
> > > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> > > Configuration Proposal
> > >
> > > Hey Yuan, thanks for the proposal
> > >
> > > I think Option 3 is the simplest to use and exposes less details than
> any
> > > other.
> > > It's also consistent with the current way of configuring state
> > > backends, as long as we treat change logging as a common feature
> > > applicable to any state backend, like e.g.
> > > state.backend.local-recovery.
> > >
> > > Option 6 seems slightly less preferable as it exposes more details but
> > > I think is the most viable alternative.
> > >
> > > Regards,
> > > Roman
> > >
> > >
> > > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <yu...@gmail.com>
> wrote:
> > > >
> > > > Hey all,
> > > >
> > > > We would like to start a discussion on how to enable/config Changelog
> > > > Statebakcend.
> > > >
> > > > As part of FLIP-158[1], Changelog state backend wraps on top of
> > existing
> > > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and
> may
> > > > expect more) and delegates state changes to the underlying state
> > > backends.
> > > > This thread is to discuss the problem of how Changelog StateBackend
> > > should
> > > > be enabled and configured.
> > > >
> > > > Proposed options to enable/config state changelog is listed below:
> > > >
> > > > Option 1: Enable Changelog Statebackend through a Boolean Flag
> > > >
> > > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> > > Special
> > > > Case
> > > >
> > > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > > > ChangelogStateBackend Exposed
> > > >
> > > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix
> for
> > > > inner backend
> > > >
> > > > Option 5: Explicit Nested Configuration + inner state backend
> > > configuration
> > > > unchanged
> > > >
> > > > Option 6: Config Changelog and Inner Statebackend All-Together
> > > >
> > > > Details of each option can be found here:
> > > >
> > >
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> > > >
> > > > When considering these options, please consider these four
> dimensions:
> > > > 1 Consistency
> > > > API/config should follow a consistent model and should not have
> > > > contradicted logic beneath
> > > > 2 Simplicity
> > > > API should be easy to use and not introduce too much burden on users
> > > > 3. Explicity
> > > > API/config should not contain implicit assumptions and should be
> > > intuitive
> > > > to users
> > > > 4. Extensibility
> > > > With foreseen future, whether the current setting can be easily
> > extended
> > > >
> > > > Please let us know what do you think and please keep the discussion
> in
> > > this
> > > > mailing thread.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > > >
> > > > Best
> > > > Yuan
> > >
> >
>

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Posted by Yu Li <ca...@gmail.com>.
+1 for option 3.

IMHO persisting (operator's) state data through change log is an
independent mechanism which could co-work with all kinds of local state
stores (heap and rocksdb). This mechanism is similar to the WAL
(write-ahead-log) mechanism in the database system. Although implement-wise
we're using wrapper (decorator) pattern and naming it as
`ChangeLogStateBackend`, it's not really another type of state backend. For
the same reason, ChangeLogStateBackend should be an internal class and not
exposed to the end user. Users only need to know / control whether to
enable change log or not, just like whether to enable WAL in the
traditional database system.

Thanks.

Best Regards,
Yu


On Thu, 3 Jun 2021 at 22:50, Piotr Nowojski <pn...@apache.org> wrote:

> Hi,
>
> I would actually prefer option 6 (or 5/4), for the sake of configuration
> being explicit and self explanatory. But at the same time I don't have very
> hard preferences and from the remaining options, option 3 seems the most
> reasonable.
>
> The question would be, do we want to expose to the users that
> ChangeLogStateBackend is wrapping an inner state backend or not? If not,
> option 3 is the best. If we do, if we want to teach the users and help them
> build the understanding of how things are working underneath, option 5 or 6
> are better.
>
> Best,
> Piotrek
>
> śr., 2 cze 2021 o 04:36 Yun Tang <my...@live.com> napisał(a):
>
> > Hi Yuan, thanks for launching this discussion.
> >
> > I prefer option-3 as this is the easiest to understand for users.
> >
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Roman Khachatryan <ro...@apache.org>
> > Sent: Monday, May 31, 2021 16:53
> > To: dev <de...@flink.apache.org>
> > Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> > Configuration Proposal
> >
> > Hey Yuan, thanks for the proposal
> >
> > I think Option 3 is the simplest to use and exposes less details than any
> > other.
> > It's also consistent with the current way of configuring state
> > backends, as long as we treat change logging as a common feature
> > applicable to any state backend, like e.g.
> > state.backend.local-recovery.
> >
> > Option 6 seems slightly less preferable as it exposes more details but
> > I think is the most viable alternative.
> >
> > Regards,
> > Roman
> >
> >
> > On Mon, May 31, 2021 at 8:39 AM Yuan Mei <yu...@gmail.com> wrote:
> > >
> > > Hey all,
> > >
> > > We would like to start a discussion on how to enable/config Changelog
> > > Statebakcend.
> > >
> > > As part of FLIP-158[1], Changelog state backend wraps on top of
> existing
> > > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> > > expect more) and delegates state changes to the underlying state
> > backends.
> > > This thread is to discuss the problem of how Changelog StateBackend
> > should
> > > be enabled and configured.
> > >
> > > Proposed options to enable/config state changelog is listed below:
> > >
> > > Option 1: Enable Changelog Statebackend through a Boolean Flag
> > >
> > > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> > Special
> > > Case
> > >
> > > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > > ChangelogStateBackend Exposed
> > >
> > > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> > > inner backend
> > >
> > > Option 5: Explicit Nested Configuration + inner state backend
> > configuration
> > > unchanged
> > >
> > > Option 6: Config Changelog and Inner Statebackend All-Together
> > >
> > > Details of each option can be found here:
> > >
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> > >
> > > When considering these options, please consider these four dimensions:
> > > 1 Consistency
> > > API/config should follow a consistent model and should not have
> > > contradicted logic beneath
> > > 2 Simplicity
> > > API should be easy to use and not introduce too much burden on users
> > > 3. Explicity
> > > API/config should not contain implicit assumptions and should be
> > intuitive
> > > to users
> > > 4. Extensibility
> > > With foreseen future, whether the current setting can be easily
> extended
> > >
> > > Please let us know what do you think and please keep the discussion in
> > this
> > > mailing thread.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > >
> > > Best
> > > Yuan
> >
>

Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend Configuration Proposal

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

I would actually prefer option 6 (or 5/4), for the sake of configuration
being explicit and self explanatory. But at the same time I don't have very
hard preferences and from the remaining options, option 3 seems the most
reasonable.

The question would be, do we want to expose to the users that
ChangeLogStateBackend is wrapping an inner state backend or not? If not,
option 3 is the best. If we do, if we want to teach the users and help them
build the understanding of how things are working underneath, option 5 or 6
are better.

Best,
Piotrek

śr., 2 cze 2021 o 04:36 Yun Tang <my...@live.com> napisał(a):

> Hi Yuan, thanks for launching this discussion.
>
> I prefer option-3 as this is the easiest to understand for users.
>
>
> Best
> Yun Tang
> ________________________________
> From: Roman Khachatryan <ro...@apache.org>
> Sent: Monday, May 31, 2021 16:53
> To: dev <de...@flink.apache.org>
> Subject: Re: [DISCUSS][Statebackend][Runtime] Changelog Statebackend
> Configuration Proposal
>
> Hey Yuan, thanks for the proposal
>
> I think Option 3 is the simplest to use and exposes less details than any
> other.
> It's also consistent with the current way of configuring state
> backends, as long as we treat change logging as a common feature
> applicable to any state backend, like e.g.
> state.backend.local-recovery.
>
> Option 6 seems slightly less preferable as it exposes more details but
> I think is the most viable alternative.
>
> Regards,
> Roman
>
>
> On Mon, May 31, 2021 at 8:39 AM Yuan Mei <yu...@gmail.com> wrote:
> >
> > Hey all,
> >
> > We would like to start a discussion on how to enable/config Changelog
> > Statebakcend.
> >
> > As part of FLIP-158[1], Changelog state backend wraps on top of existing
> > state backend (HashMapStateBackend, EmbeddedRocksDBStateBackend and may
> > expect more) and delegates state changes to the underlying state
> backends.
> > This thread is to discuss the problem of how Changelog StateBackend
> should
> > be enabled and configured.
> >
> > Proposed options to enable/config state changelog is listed below:
> >
> > Option 1: Enable Changelog Statebackend through a Boolean Flag
> >
> > Option 2: Enable Changelog Statebackend through a Boolean Flag + a
> Special
> > Case
> >
> > Option 3: Enable Changelog Statebackend through a Boolean Flag + W/O
> > ChangelogStateBackend Exposed
> >
> > Option 4: Explicit Nested Configuration + “changelog.inner” prefix for
> > inner backend
> >
> > Option 5: Explicit Nested Configuration + inner state backend
> configuration
> > unchanged
> >
> > Option 6: Config Changelog and Inner Statebackend All-Together
> >
> > Details of each option can be found here:
> >
> https://docs.google.com/document/d/13AaCf5fczYTDHZ4G1mgYL685FqbnoEhgo0cdwuJlZmw/edit?usp=sharing
> >
> > When considering these options, please consider these four dimensions:
> > 1 Consistency
> > API/config should follow a consistent model and should not have
> > contradicted logic beneath
> > 2 Simplicity
> > API should be easy to use and not introduce too much burden on users
> > 3. Explicity
> > API/config should not contain implicit assumptions and should be
> intuitive
> > to users
> > 4. Extensibility
> > With foreseen future, whether the current setting can be easily extended
> >
> > Please let us know what do you think and please keep the discussion in
> this
> > mailing thread.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> >
> > Best
> > Yuan
>