You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Matthias Pohl <ma...@aiven.io.INVALID> on 2022/12/07 08:28:03 UTC

[DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Hi everyone,

The Flink community introduced a new way how leader election works in Flink
1.15 with FLINK-24038 [1]. Instead of a per-component leader election, all
components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) use a
single (per-JM-process) leader election instance. It was meant to fix some
issues with deregistering Flink applications in multi-JM setups [1] and
reduce load on the HA backend. Users were able to opt-out and switch back
to the old implementation [2].

The new approach was kind of complicated to implement while still
maintaining support for the old implementation through the existing
interfaces. With FLINK-25806 [3], the old implementation was removed in
Flink 1.16. This enables us to clean things up in the
HighAvailabilityServices.

The proposed change would mean touching the HighAvailabilityServices
interface. Currently, the interface provides factory methods for
LeaderElectionServices of the aforementioned components. All of these
LeaderElectionServices are internally based on the same LeaderElection
instance handled in DefaultMultipleComponentLeaderElectionService.
Therefore, we can replace all these factory methods by a single one which
returns a LeaderElectionService instance that’s going to be used by all
components. Of course, we could also stick to the old
HighAvailabilityServices and return the same LeaderElectionService instance
through each of the four factory methods (similar to what’s done now with
the MultipleComponentLeaderElectionService).

A similar question appears for the corresponding LeaderRetrievalService: We
could create a single listener instead of having individual per-component
listeners to reflect the current requirement of having a per-JM-process
leader election and align it with the LeaderElectionService approach (if we
decide on modifying the HA interface).

I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
considered a public interface. Still, I am aware it might affect users
(e.g. if they implemented their own HA services or if the project monitors
HA information in the HA backend outside of Flink). That’s why I wanted to
start a discussion here. I’m happy to create a FLIP, if someone thinks it’s
worth it. The work is going to be covered by FLINK-26522 [4]

Pro’s (for changing the interface methods):

It reflects the requirements stated in FLINK-24038 [1] about having a
per-JM-process LeaderElection
-

It helps reducing the complexity of the JobManager

Con’s:

We lose some flexibility in terms of per-component LeaderElection
-

Interface change might affect other projects that customize HA services

I’m in favor of reducing the amount of factory methods in
HighAvailabilityServices considering that it’s not a public interface. I’m
looking forward to your opinions.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038

[2]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services

[3] https://issues.apache.org/jira/browse/FLINK-25806

[4] https://issues.apache.org/jira/browse/FLINK-26522

[image: Aiven]

Matthias Pohl

Software Engineer, Aiven

matthias.pohl@aiven.io <in...@aiven.io>

aiven.io <https://www.aiven.io> |
<https://www.facebook.com/aivencloud> <https://www.facebook.com/aivencloud/>
<https://www.linkedin.com/company/aiven/>
<https://www.linkedin.com/company/aiven> <https://twitter.com/aiven_io>
<https://twitter.com/aiven_io>

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

Amtsgericht Charlottenburg, HRB 209739 B

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Matthias Pohl <ma...@aiven.io.INVALID>.

Thanks for the discussion. I will go ahead then and refactor the code
without touching the HighAvailabilityServices interface. We will keep the
per-component factory methods and return a single leader election instance
for all of them. Other HighAvailabilityServices implementations can then
come up with a more fine-grained per-component leader election if they
want. I will document the problems that arise with such a per-component
implementation according to FLINK-24038 [1].

@Chesnay: I will move the discussion/documentation on how the refactoring
should be done into FLINK-26522 [2]. In the end, it's about modifying the
DefaultLeaderElectionService to support multiple contenders, i.e. merging
the MultiComponentLeaderElectionService interface into the
LeaderElectionService interface. Ownership and lifecycle of the leader
election clients is explicitly moved from the LeaderContenders into the
HighAvailabilityServices. That's already the case with the current
per-process leader election. It's rather about documenting this properly
now. But as said, see FLINK-26522 [2] for further details.

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-26522

On Sun, Dec 11, 2022 at 5:38 AM Zheng Yu Chen <ja...@gmail.com> wrote:

> thanks to Matthias, I read the previous email here, and I will express my
> own views on some issues
>
> @Matthias
>
> My opinion is that the scheme of high-availability splitting of each
> component should be retained. As mentioned in David, when we need to split
> each component, we need to use each LeaderElectionService. Of course, if it
> is merged in a single JM case I have no opinion on becoming one,
> considering that JM may be able to support horizontal expansion [1] in the
> future, I suggest to keep it
>
> @David:
>
> I agree with your opinion, we should rethink how to split the heavy
> components in JM and support the corresponding high availability, instead
> of simply modifying and integrating directly into a LeaderElectionService
> to return
> If you have more ideas and suggestions for FLIP-257[1], we can move to
> thread 257 for discussion[2]
>
> @Dong:
>
> I think the scheme of high-availability splitting of each component should
> be retained, as David commented
> I have been researching related programs and waiting for more positive
> feedback from the community, because this part of the work is more
> complicated than I imagined, and I am afraid that I cannot complete such a
> large program by myself. That is just a preliminary solution. In fact, I
> have imagined splitting each service and using a separate HA, but as the
> FLIP-257 [1] discussion thread [2] said, this will increase the complexity
> of component communication. If you are FLIP-257[1] has more ideas and
> suggestions, we can move to thread 257 for discussion[2]
>
>
> [1] FLIP-271 : Flink JobManager Process Split:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split
> [2] FLIP-271 Discussion thread:
> https://lists.apache.org/thread/r3fnw13j5h04z87lb34l42nvob4pq2xj
>
> Matthias Pohl <ma...@aiven.io.invalid> 于2022年12月7日周三 16:28写道：
>
> > Hi everyone,
> >
> > The Flink community introduced a new way how leader election works in
> Flink
> > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> all
> > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> use a
> > single (per-JM-process) leader election instance. It was meant to fix
> some
> > issues with deregistering Flink applications in multi-JM setups [1] and
> > reduce load on the HA backend. Users were able to opt-out and switch back
> > to the old implementation [2].
> >
> > The new approach was kind of complicated to implement while still
> > maintaining support for the old implementation through the existing
> > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > Flink 1.16. This enables us to clean things up in the
> > HighAvailabilityServices.
> >
> > The proposed change would mean touching the HighAvailabilityServices
> > interface. Currently, the interface provides factory methods for
> > LeaderElectionServices of the aforementioned components. All of these
> > LeaderElectionServices are internally based on the same LeaderElection
> > instance handled in DefaultMultipleComponentLeaderElectionService.
> > Therefore, we can replace all these factory methods by a single one which
> > returns a LeaderElectionService instance that’s going to be used by all
> > components. Of course, we could also stick to the old
> > HighAvailabilityServices and return the same LeaderElectionService
> instance
> > through each of the four factory methods (similar to what’s done now with
> > the MultipleComponentLeaderElectionService).
> >
> > A similar question appears for the corresponding LeaderRetrievalService:
> We
> > could create a single listener instead of having individual per-component
> > listeners to reflect the current requirement of having a per-JM-process
> > leader election and align it with the LeaderElectionService approach (if
> we
> > decide on modifying the HA interface).
> >
> > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> > considered a public interface. Still, I am aware it might affect users
> > (e.g. if they implemented their own HA services or if the project
> monitors
> > HA information in the HA backend outside of Flink). That’s why I wanted
> to
> > start a discussion here. I’m happy to create a FLIP, if someone thinks
> it’s
> > worth it. The work is going to be covered by FLINK-26522 [4]
> >
> > Pro’s (for changing the interface methods):
> >
> >    -
> >
> >    It reflects the requirements stated in FLINK-24038 [1] about having a
> >    per-JM-process LeaderElection
> >    -
> >
> >    It helps reducing the complexity of the JobManager
> >
> > Con’s:
> >
> >    -
> >
> >    We lose some flexibility in terms of per-component LeaderElection
> >    -
> >
> >    Interface change might affect other projects that customize HA
> services
> >
> >
> > I’m in favor of reducing the amount of factory methods in
> > HighAvailabilityServices considering that it’s not a public interface.
> I’m
> > looking forward to your opinions.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-24038
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-25806
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-26522
> >
> >
> > --
> >
> > [image: Aiven]
> >
> > Matthias Pohl
> >
> > Software Engineer, Aiven
> >
> > matthias.pohl@aiven.io <in...@aiven.io>
> >
> > aiven.io <https://www.aiven.io>   |
> > <https://www.facebook.com/aivencloud> <
> > https://www.facebook.com/aivencloud/>
> >     <https://www.linkedin.com/company/aiven/>
> > <https://www.linkedin.com/company/aiven>    <
> https://twitter.com/aiven_io>
> > <https://twitter.com/aiven_io>
> >
> > Aiven Deutschland GmbH
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>
>
> --
> Best
>
> ConradJam
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Zheng Yu Chen <ja...@gmail.com>.

thanks to Matthias, I read the previous email here, and I will express my
own views on some issues

@Matthias

My opinion is that the scheme of high-availability splitting of each
component should be retained. As mentioned in David, when we need to split
each component, we need to use each LeaderElectionService. Of course, if it
is merged in a single JM case I have no opinion on becoming one,
considering that JM may be able to support horizontal expansion [1] in the
future, I suggest to keep it

@David:

I agree with your opinion, we should rethink how to split the heavy
components in JM and support the corresponding high availability, instead
of simply modifying and integrating directly into a LeaderElectionService
to return
If you have more ideas and suggestions for FLIP-257[1], we can move to
thread 257 for discussion[2]

@Dong:

I think the scheme of high-availability splitting of each component should
be retained, as David commented
I have been researching related programs and waiting for more positive
feedback from the community, because this part of the work is more
complicated than I imagined, and I am afraid that I cannot complete such a
large program by myself. That is just a preliminary solution. In fact, I
have imagined splitting each service and using a separate HA, but as the
FLIP-257 [1] discussion thread [2] said, this will increase the complexity
of component communication. If you are FLIP-257[1] has more ideas and
suggestions, we can move to thread 257 for discussion[2]


[1] FLIP-271 : Flink JobManager Process Split:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split
[2] FLIP-271 Discussion thread:
https://lists.apache.org/thread/r3fnw13j5h04z87lb34l42nvob4pq2xj

Matthias Pohl <ma...@aiven.io.invalid> 于2022年12月7日周三 16:28写道：

> Hi everyone,
>
> The Flink community introduced a new way how leader election works in Flink
> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election, all
> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) use a
> single (per-JM-process) leader election instance. It was meant to fix some
> issues with deregistering Flink applications in multi-JM setups [1] and
> reduce load on the HA backend. Users were able to opt-out and switch back
> to the old implementation [2].
>
> The new approach was kind of complicated to implement while still
> maintaining support for the old implementation through the existing
> interfaces. With FLINK-25806 [3], the old implementation was removed in
> Flink 1.16. This enables us to clean things up in the
> HighAvailabilityServices.
>
> The proposed change would mean touching the HighAvailabilityServices
> interface. Currently, the interface provides factory methods for
> LeaderElectionServices of the aforementioned components. All of these
> LeaderElectionServices are internally based on the same LeaderElection
> instance handled in DefaultMultipleComponentLeaderElectionService.
> Therefore, we can replace all these factory methods by a single one which
> returns a LeaderElectionService instance that’s going to be used by all
> components. Of course, we could also stick to the old
> HighAvailabilityServices and return the same LeaderElectionService instance
> through each of the four factory methods (similar to what’s done now with
> the MultipleComponentLeaderElectionService).
>
> A similar question appears for the corresponding LeaderRetrievalService: We
> could create a single listener instead of having individual per-component
> listeners to reflect the current requirement of having a per-JM-process
> leader election and align it with the LeaderElectionService approach (if we
> decide on modifying the HA interface).
>
> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> considered a public interface. Still, I am aware it might affect users
> (e.g. if they implemented their own HA services or if the project monitors
> HA information in the HA backend outside of Flink). That’s why I wanted to
> start a discussion here. I’m happy to create a FLIP, if someone thinks it’s
> worth it. The work is going to be covered by FLINK-26522 [4]
>
> Pro’s (for changing the interface methods):
>
>    -
>
>    It reflects the requirements stated in FLINK-24038 [1] about having a
>    per-JM-process LeaderElection
>    -
>
>    It helps reducing the complexity of the JobManager
>
> Con’s:
>
>    -
>
>    We lose some flexibility in terms of per-component LeaderElection
>    -
>
>    Interface change might affect other projects that customize HA services
>
>
> I’m in favor of reducing the amount of factory methods in
> HighAvailabilityServices considering that it’s not a public interface. I’m
> looking forward to your opinions.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-24038
>
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
>
> [3] https://issues.apache.org/jira/browse/FLINK-25806
>
> [4] https://issues.apache.org/jira/browse/FLINK-26522
>
>
> --
>
> [image: Aiven]
>
> Matthias Pohl
>
> Software Engineer, Aiven
>
> matthias.pohl@aiven.io <in...@aiven.io>
>
> aiven.io <https://www.aiven.io>   |
> <https://www.facebook.com/aivencloud> <
> https://www.facebook.com/aivencloud/>
>     <https://www.linkedin.com/company/aiven/>
> <https://www.linkedin.com/company/aiven>    <https://twitter.com/aiven_io>
> <https://twitter.com/aiven_io>
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> Amtsgericht Charlottenburg, HRB 209739 B
>


-- 
Best

ConradJam

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Dong Lin <li...@gmail.com>.

Hi Chesnay,

I like the use-cases (e.g. running multiple UIs for load-balancing
purposes) mentioned. On the other hand, these are probably not
high-priority features, and we don't know when the community will get to
implement these features. It seems a bit over-design to add implementation
complexity for something that we won't need?

Adding regarding the effort to add back the per-component election
capability: given that the implementation already follows per-process
election, and given that there will likely be a lot of extra
design/implementation/test effort needed to achieve the use-cases described
above, maybe the change proposed in this thread won't affect the overall
effort much?

I am hoping that by making the Flink codebase simpler and more readable, we
can increase developer velocity and reduce the time we needed to tackle the
bugs such as FLINK-24038. Then we would have more time to actually
implement the fancy use-cases described above:) What do you think?


On Sat, Dec 10, 2022 at 1:31 AM Chesnay Schepler <ch...@apache.org> wrote:

> I generally agree that the internals of the HA services are currently
> too complex, but I'm wondering if the proposal doesn't go a bit too far
> to resolve those.
> Is there maybe some way we can refactor things internally to reduce
> complexity while keeping the per-component semantics?
>
> Ultimately, the per-component leader election gives us the theoretical
> ability to split components into separate processes, which is also
> something we strive to maintain in other layers like the RPC system.
>
> That's a powerful property to have, which is also quite difficult to
> patch back in once you get rid of it.


> Of note, whenever a discussion came up about scalability of the JM
> process the first answer has _always_ been "well we could split it up at
> one point if it's necessary.".
>
>  > I am curious that there are so many such extreme requirements that we
> have to rely on the per-component pattern to achieve them?
>
> This doesn't necessarily go into the _extreme_ direction. It could be
> something as simple as running the UI in an environment that is more
> accessible than the other processes, running multiple UIs for
> load-balancing purposes without paying the additional memory tax of a
> full JM, or the Dispatcher process not running any user-code (== some
> isolation between jobs).

The original FLIP-6 design had ideas to that end, and they aren't really
> bad ideas. We just never executed them.


>  > users may inadvertently recreate problems similar to FLINK-24038
>
> That's certainly a risk, but the per-process leader election was just
> one possible solution, that just also had other benefits at the time.
>
>
>
> Right now I unfortunately can't provide specific ideas on how we could
> clean things up internally; that'd take some time that I won't have
> until next year.
>
> On 09/12/2022 16:41, weijie guo wrote:
> > Hi Matthias,
> >
> > Thanks for the proposal! I am in favor of cleaning up this interface, and
> > It seems a bit cumbersome now. Especially, the implementation of
> > per-component leader election has been removed from our current code
> path.
> >
> > To be honest, I don't like the per-component approach. I'm even often
> asked
> > why flink used this way? Of course, I admit that this will make our HA
> > service more flexible. But personally, perhaps the per-process solution
> is
> > more better, at least from the perspective of reducing potential problems
> > like FLINK-24038, and it can definitely reduce the complexity of
> JobManager.
> >
> > Regarding "We lose some flexibility in terms of per-component
> > LeaderElection ", I am curious that there are so many such extreme
> > requirements that we have to rely on the per-component pattern to achieve
> > them? If there are, is this requirement really reasonable, and users may
> > inadvertently recreate problems similar to FLINK-24038.
> >
> > Best regards,
> >
> > Weijie
> >
> >
> > Matthias Pohl <ma...@aiven.io.invalid> 于2022年12月9日周五 17:47写道：
> >
> >> Hi Dong,
> >> see my answers below.
> >>
> >> Regarding "Interface change might affect other projects that customize
> HA
> >>> services", are you referring to those projects which hack into Flink's
> >>> source code (as opposed to using Flink's public API) to customize HA
> >>> services?
> >>
> >> Yes, the proposed change might affect projects that need to have their
> own
> >> HA implementation for whatever reason (interface change) or if a project
> >> accesses the HA backend to retrieve metadata from the ZK node/k8s
> ConfigMap
> >> (change about how the data is stored in the HA backend). The latter one
> was
> >> actually already the case with the change introduced by FLINK-24038 [1].
> >>
> >> By the way, since Flink already supports zookeeper and kubernetes as the
> >>> high availability services, are you aware of many projects that still
> >> need
> >>> to hack into Flink's code to customize high availability services?
> >>
> >> I am aware of projects that use customized HA. But based on our
> experience
> >> in FLINK-24038 [1] no one complained. So, making people aware through
> the
> >> mailing list might be good enough.
> >>
> >> And regarding "We lose some flexibility in terms of per-component
> >>> LeaderElection", could you explain what flexibility we need so that we
> >> can
> >>> gauge the associated downside of losing the flexibility?
> >>
> >> Just to recap: The current interface allows having per-component
> >> LeaderElection (e.g. the ResourceManager leader can run on a different
> >> JobManager than the Dispatcher). This implementation was replaced by
> >> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation
> does
> >> LeaderElection per process (e.g. ResourceManager and Dispatcher always
> run
> >> on the same JobManager). The changed interface would require us to touch
> >> the interface again if (for whatever reason) we want to reintroduce
> >> per-component leader election in some form.
> >> The interface change is, strictly speaking, not necessary to provide the
> >> new functionality. But I like the idea of certain requirements
> (currently,
> >> we need per-process leader election to fix what was reported in
> FLINK-24038
> >> [1]) being reflected in the interface. This makes sure that we don't
> >> introduce a per-component leader election again accidentally in the
> future
> >> because we thought it's a good idea but forgot about FLINK-24038.
> >>
> >> Matthias
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-24038
> >> [2] https://issues.apache.org/jira/browse/FLINK-25806
> >>
> >> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
> >>
> >>> Hi Matthias,
> >>>
> >>> Thanks for the proposal! Overall I am in favor of making this interface
> >>> change to make Flink's codebase more maintainable.
> >>>
> >>> Regarding "Interface change might affect other projects that customize
> HA
> >>> services", are you referring to those projects which hack into Flink's
> >>> source code (as opposed to using Flink's public API) to customize HA
> >>> services? If yes, it seems OK to break those projects since we don't
> have
> >>> any backward compatibility guarantee for those projects.
> >>>
> >>> By the way, since Flink already supports zookeeper and kubernetes as
> the
> >>> high availability services, are you aware of many projects that still
> >> need
> >>> to hack into Flink's code to customize high availability services?
> >>>
> >>> And regarding "We lose some flexibility in terms of per-component
> >>> LeaderElection", could you explain what flexibility we need so that we
> >> can
> >>> gauge the associated downside of losing the flexibility?
> >>>
> >>> Thanks!
> >>> Dong
> >>>
> >>>
> >>>
> >>> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> >>> .invalid>
> >>> wrote:
> >>>
> >>>> Hi everyone,
> >>>>
> >>>> The Flink community introduced a new way how leader election works in
> >>> Flink
> >>>> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> >>> all
> >>>> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> >>> use a
> >>>> single (per-JM-process) leader election instance. It was meant to fix
> >>> some
> >>>> issues with deregistering Flink applications in multi-JM setups [1]
> and
> >>>> reduce load on the HA backend. Users were able to opt-out and switch
> >> back
> >>>> to the old implementation [2].
> >>>>
> >>>> The new approach was kind of complicated to implement while still
> >>>> maintaining support for the old implementation through the existing
> >>>> interfaces. With FLINK-25806 [3], the old implementation was removed
> in
> >>>> Flink 1.16. This enables us to clean things up in the
> >>>> HighAvailabilityServices.
> >>>>
> >>>> The proposed change would mean touching the HighAvailabilityServices
> >>>> interface. Currently, the interface provides factory methods for
> >>>> LeaderElectionServices of the aforementioned components. All of these
> >>>> LeaderElectionServices are internally based on the same LeaderElection
> >>>> instance handled in DefaultMultipleComponentLeaderElectionService.
> >>>> Therefore, we can replace all these factory methods by a single one
> >> which
> >>>> returns a LeaderElectionService instance that’s going to be used by
> all
> >>>> components. Of course, we could also stick to the old
> >>>> HighAvailabilityServices and return the same LeaderElectionService
> >>> instance
> >>>> through each of the four factory methods (similar to what’s done now
> >> with
> >>>> the MultipleComponentLeaderElectionService).
> >>>>
> >>>> A similar question appears for the corresponding
> >> LeaderRetrievalService:
> >>> We
> >>>> could create a single listener instead of having individual
> >> per-component
> >>>> listeners to reflect the current requirement of having a
> per-JM-process
> >>>> leader election and align it with the LeaderElectionService approach
> >> (if
> >>> we
> >>>> decide on modifying the HA interface).
> >>>>
> >>>> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
> >> not
> >>>> considered a public interface. Still, I am aware it might affect users
> >>>> (e.g. if they implemented their own HA services or if the project
> >>> monitors
> >>>> HA information in the HA backend outside of Flink). That’s why I
> wanted
> >>> to
> >>>> start a discussion here. I’m happy to create a FLIP, if someone thinks
> >>> it’s
> >>>> worth it. The work is going to be covered by FLINK-26522 [4]
> >>>>
> >>>> Pro’s (for changing the interface methods):
> >>>>
> >>>>     -
> >>>>
> >>>>     It reflects the requirements stated in FLINK-24038 [1] about
> having
> >> a
> >>>>     per-JM-process LeaderElection
> >>>>     -
> >>>>
> >>>>     It helps reducing the complexity of the JobManager
> >>>>
> >>>> Con’s:
> >>>>
> >>>>     -
> >>>>
> >>>>     We lose some flexibility in terms of per-component LeaderElection
> >>>>     -
> >>>>
> >>>>     Interface change might affect other projects that customize HA
> >>> services
> >>>>
> >>>> I’m in favor of reducing the amount of factory methods in
> >>>> HighAvailabilityServices considering that it’s not a public interface.
> >>> I’m
> >>>> looking forward to your opinions.
> >>>>
> >>>> Matthias
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/FLINK-24038
> >>>>
> >>>> [2]
> >>>>
> >>>>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >>>> [3] https://issues.apache.org/jira/browse/FLINK-25806
> >>>>
> >>>> [4] https://issues.apache.org/jira/browse/FLINK-26522
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> [image: Aiven]
> >>>>
> >>>> Matthias Pohl
> >>>>
> >>>> Software Engineer, Aiven
> >>>>
> >>>> matthias.pohl@aiven.io <in...@aiven.io>
> >>>>
> >>>> aiven.io <https://www.aiven.io>   |
> >>>> <https://www.facebook.com/aivencloud> <
> >>>> https://www.facebook.com/aivencloud/>
> >>>>      <https://www.linkedin.com/company/aiven/>
> >>>> <https://www.linkedin.com/company/aiven>    <
> >>> https://twitter.com/aiven_io>
> >>>> <https://twitter.com/aiven_io>
> >>>>
> >>>> Aiven Deutschland GmbH
> >>>>
> >>>> Immanuelkirchstraße 26, 10405 Berlin
> >>>>
> >>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>>>
> >>>> Amtsgericht Charlottenburg, HRB 209739 B
> >>>>
>
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Chesnay Schepler <ch...@apache.org>.

I generally agree that the internals of the HA services are currently 
too complex, but I'm wondering if the proposal doesn't go a bit too far 
to resolve those.
Is there maybe some way we can refactor things internally to reduce 
complexity while keeping the per-component semantics?

Ultimately, the per-component leader election gives us the theoretical 
ability to split components into separate processes, which is also 
something we strive to maintain in other layers like the RPC system.

That's a powerful property to have, which is also quite difficult to 
patch back in once you get rid of it.

Of note, whenever a discussion came up about scalability of the JM 
process the first answer has _always_ been "well we could split it up at 
one point if it's necessary.".

 > I am curious that there are so many such extreme requirements that we 
have to rely on the per-component pattern to achieve them?

This doesn't necessarily go into the _extreme_ direction. It could be 
something as simple as running the UI in an environment that is more 
accessible than the other processes, running multiple UIs for 
load-balancing purposes without paying the additional memory tax of a 
full JM, or the Dispatcher process not running any user-code (== some 
isolation between jobs).
The original FLIP-6 design had ideas to that end, and they aren't really 
bad ideas. We just never executed them.

 > users may inadvertently recreate problems similar to FLINK-24038

That's certainly a risk, but the per-process leader election was just 
one possible solution, that just also had other benefits at the time.



Right now I unfortunately can't provide specific ideas on how we could 
clean things up internally; that'd take some time that I won't have 
until next year.

On 09/12/2022 16:41, weijie guo wrote:
> Hi Matthias,
>
> Thanks for the proposal! I am in favor of cleaning up this interface, and
> It seems a bit cumbersome now. Especially, the implementation of
> per-component leader election has been removed from our current code path.
>
> To be honest, I don't like the per-component approach. I'm even often asked
> why flink used this way? Of course, I admit that this will make our HA
> service more flexible. But personally, perhaps the per-process solution is
> more better, at least from the perspective of reducing potential problems
> like FLINK-24038, and it can definitely reduce the complexity of JobManager.
>
> Regarding "We lose some flexibility in terms of per-component
> LeaderElection ", I am curious that there are so many such extreme
> requirements that we have to rely on the per-component pattern to achieve
> them? If there are, is this requirement really reasonable, and users may
> inadvertently recreate problems similar to FLINK-24038.
>
> Best regards,
>
> Weijie
>
>
> Matthias Pohl <ma...@aiven.io.invalid> 于2022年12月9日周五 17:47写道：
>
>> Hi Dong,
>> see my answers below.
>>
>> Regarding "Interface change might affect other projects that customize HA
>>> services", are you referring to those projects which hack into Flink's
>>> source code (as opposed to using Flink's public API) to customize HA
>>> services?
>>
>> Yes, the proposed change might affect projects that need to have their own
>> HA implementation for whatever reason (interface change) or if a project
>> accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
>> (change about how the data is stored in the HA backend). The latter one was
>> actually already the case with the change introduced by FLINK-24038 [1].
>>
>> By the way, since Flink already supports zookeeper and kubernetes as the
>>> high availability services, are you aware of many projects that still
>> need
>>> to hack into Flink's code to customize high availability services?
>>
>> I am aware of projects that use customized HA. But based on our experience
>> in FLINK-24038 [1] no one complained. So, making people aware through the
>> mailing list might be good enough.
>>
>> And regarding "We lose some flexibility in terms of per-component
>>> LeaderElection", could you explain what flexibility we need so that we
>> can
>>> gauge the associated downside of losing the flexibility?
>>
>> Just to recap: The current interface allows having per-component
>> LeaderElection (e.g. the ResourceManager leader can run on a different
>> JobManager than the Dispatcher). This implementation was replaced by
>> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
>> LeaderElection per process (e.g. ResourceManager and Dispatcher always run
>> on the same JobManager). The changed interface would require us to touch
>> the interface again if (for whatever reason) we want to reintroduce
>> per-component leader election in some form.
>> The interface change is, strictly speaking, not necessary to provide the
>> new functionality. But I like the idea of certain requirements (currently,
>> we need per-process leader election to fix what was reported in FLINK-24038
>> [1]) being reflected in the interface. This makes sure that we don't
>> introduce a per-component leader election again accidentally in the future
>> because we thought it's a good idea but forgot about FLINK-24038.
>>
>> Matthias
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-24038
>> [2] https://issues.apache.org/jira/browse/FLINK-25806
>>
>> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
>>
>>> Hi Matthias,
>>>
>>> Thanks for the proposal! Overall I am in favor of making this interface
>>> change to make Flink's codebase more maintainable.
>>>
>>> Regarding "Interface change might affect other projects that customize HA
>>> services", are you referring to those projects which hack into Flink's
>>> source code (as opposed to using Flink's public API) to customize HA
>>> services? If yes, it seems OK to break those projects since we don't have
>>> any backward compatibility guarantee for those projects.
>>>
>>> By the way, since Flink already supports zookeeper and kubernetes as the
>>> high availability services, are you aware of many projects that still
>> need
>>> to hack into Flink's code to customize high availability services?
>>>
>>> And regarding "We lose some flexibility in terms of per-component
>>> LeaderElection", could you explain what flexibility we need so that we
>> can
>>> gauge the associated downside of losing the flexibility?
>>>
>>> Thanks!
>>> Dong
>>>
>>>
>>>
>>> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
>>> .invalid>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> The Flink community introduced a new way how leader election works in
>>> Flink
>>>> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
>>> all
>>>> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
>>> use a
>>>> single (per-JM-process) leader election instance. It was meant to fix
>>> some
>>>> issues with deregistering Flink applications in multi-JM setups [1] and
>>>> reduce load on the HA backend. Users were able to opt-out and switch
>> back
>>>> to the old implementation [2].
>>>>
>>>> The new approach was kind of complicated to implement while still
>>>> maintaining support for the old implementation through the existing
>>>> interfaces. With FLINK-25806 [3], the old implementation was removed in
>>>> Flink 1.16. This enables us to clean things up in the
>>>> HighAvailabilityServices.
>>>>
>>>> The proposed change would mean touching the HighAvailabilityServices
>>>> interface. Currently, the interface provides factory methods for
>>>> LeaderElectionServices of the aforementioned components. All of these
>>>> LeaderElectionServices are internally based on the same LeaderElection
>>>> instance handled in DefaultMultipleComponentLeaderElectionService.
>>>> Therefore, we can replace all these factory methods by a single one
>> which
>>>> returns a LeaderElectionService instance that’s going to be used by all
>>>> components. Of course, we could also stick to the old
>>>> HighAvailabilityServices and return the same LeaderElectionService
>>> instance
>>>> through each of the four factory methods (similar to what’s done now
>> with
>>>> the MultipleComponentLeaderElectionService).
>>>>
>>>> A similar question appears for the corresponding
>> LeaderRetrievalService:
>>> We
>>>> could create a single listener instead of having individual
>> per-component
>>>> listeners to reflect the current requirement of having a per-JM-process
>>>> leader election and align it with the LeaderElectionService approach
>> (if
>>> we
>>>> decide on modifying the HA interface).
>>>>
>>>> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
>> not
>>>> considered a public interface. Still, I am aware it might affect users
>>>> (e.g. if they implemented their own HA services or if the project
>>> monitors
>>>> HA information in the HA backend outside of Flink). That’s why I wanted
>>> to
>>>> start a discussion here. I’m happy to create a FLIP, if someone thinks
>>> it’s
>>>> worth it. The work is going to be covered by FLINK-26522 [4]
>>>>
>>>> Pro’s (for changing the interface methods):
>>>>
>>>>     -
>>>>
>>>>     It reflects the requirements stated in FLINK-24038 [1] about having
>> a
>>>>     per-JM-process LeaderElection
>>>>     -
>>>>
>>>>     It helps reducing the complexity of the JobManager
>>>>
>>>> Con’s:
>>>>
>>>>     -
>>>>
>>>>     We lose some flexibility in terms of per-component LeaderElection
>>>>     -
>>>>
>>>>     Interface change might affect other projects that customize HA
>>> services
>>>>
>>>> I’m in favor of reducing the amount of factory methods in
>>>> HighAvailabilityServices considering that it’s not a public interface.
>>> I’m
>>>> looking forward to your opinions.
>>>>
>>>> Matthias
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-24038
>>>>
>>>> [2]
>>>>
>>>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
>>>> [3] https://issues.apache.org/jira/browse/FLINK-25806
>>>>
>>>> [4] https://issues.apache.org/jira/browse/FLINK-26522
>>>>
>>>>
>>>> --
>>>>
>>>> [image: Aiven]
>>>>
>>>> Matthias Pohl
>>>>
>>>> Software Engineer, Aiven
>>>>
>>>> matthias.pohl@aiven.io <in...@aiven.io>
>>>>
>>>> aiven.io <https://www.aiven.io>   |
>>>> <https://www.facebook.com/aivencloud> <
>>>> https://www.facebook.com/aivencloud/>
>>>>      <https://www.linkedin.com/company/aiven/>
>>>> <https://www.linkedin.com/company/aiven>    <
>>> https://twitter.com/aiven_io>
>>>> <https://twitter.com/aiven_io>
>>>>
>>>> Aiven Deutschland GmbH
>>>>
>>>> Immanuelkirchstraße 26, 10405 Berlin
>>>>
>>>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>>>>
>>>> Amtsgericht Charlottenburg, HRB 209739 B
>>>>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by weijie guo <gu...@gmail.com>.

Hi Matthias,

Thanks for the proposal! I am in favor of cleaning up this interface, and
It seems a bit cumbersome now. Especially, the implementation of
per-component leader election has been removed from our current code path.

To be honest, I don't like the per-component approach. I'm even often asked
why flink used this way? Of course, I admit that this will make our HA
service more flexible. But personally, perhaps the per-process solution is
more better, at least from the perspective of reducing potential problems
like FLINK-24038, and it can definitely reduce the complexity of JobManager.

Regarding "We lose some flexibility in terms of per-component
LeaderElection ", I am curious that there are so many such extreme
requirements that we have to rely on the per-component pattern to achieve
them? If there are, is this requirement really reasonable, and users may
inadvertently recreate problems similar to FLINK-24038.

Best regards,

Weijie


Matthias Pohl <ma...@aiven.io.invalid> 于2022年12月9日周五 17:47写道：

> Hi Dong,
> see my answers below.
>
> Regarding "Interface change might affect other projects that customize HA
> > services", are you referring to those projects which hack into Flink's
> > source code (as opposed to using Flink's public API) to customize HA
> > services?
>
>
> Yes, the proposed change might affect projects that need to have their own
> HA implementation for whatever reason (interface change) or if a project
> accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
> (change about how the data is stored in the HA backend). The latter one was
> actually already the case with the change introduced by FLINK-24038 [1].
>
> By the way, since Flink already supports zookeeper and kubernetes as the
> > high availability services, are you aware of many projects that still
> need
> > to hack into Flink's code to customize high availability services?
>
>
> I am aware of projects that use customized HA. But based on our experience
> in FLINK-24038 [1] no one complained. So, making people aware through the
> mailing list might be good enough.
>
> And regarding "We lose some flexibility in terms of per-component
> > LeaderElection", could you explain what flexibility we need so that we
> can
> > gauge the associated downside of losing the flexibility?
>
>
> Just to recap: The current interface allows having per-component
> LeaderElection (e.g. the ResourceManager leader can run on a different
> JobManager than the Dispatcher). This implementation was replaced by
> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
> LeaderElection per process (e.g. ResourceManager and Dispatcher always run
> on the same JobManager). The changed interface would require us to touch
> the interface again if (for whatever reason) we want to reintroduce
> per-component leader election in some form.
> The interface change is, strictly speaking, not necessary to provide the
> new functionality. But I like the idea of certain requirements (currently,
> we need per-process leader election to fix what was reported in FLINK-24038
> [1]) being reflected in the interface. This makes sure that we don't
> introduce a per-component leader election again accidentally in the future
> because we thought it's a good idea but forgot about FLINK-24038.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-24038
> [2] https://issues.apache.org/jira/browse/FLINK-25806
>
> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
>
> > Hi Matthias,
> >
> > Thanks for the proposal! Overall I am in favor of making this interface
> > change to make Flink's codebase more maintainable.
> >
> > Regarding "Interface change might affect other projects that customize HA
> > services", are you referring to those projects which hack into Flink's
> > source code (as opposed to using Flink's public API) to customize HA
> > services? If yes, it seems OK to break those projects since we don't have
> > any backward compatibility guarantee for those projects.
> >
> > By the way, since Flink already supports zookeeper and kubernetes as the
> > high availability services, are you aware of many projects that still
> need
> > to hack into Flink's code to customize high availability services?
> >
> > And regarding "We lose some flexibility in terms of per-component
> > LeaderElection", could you explain what flexibility we need so that we
> can
> > gauge the associated downside of losing the flexibility?
> >
> > Thanks!
> > Dong
> >
> >
> >
> > On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> > .invalid>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > The Flink community introduced a new way how leader election works in
> > Flink
> > > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> > all
> > > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> > use a
> > > single (per-JM-process) leader election instance. It was meant to fix
> > some
> > > issues with deregistering Flink applications in multi-JM setups [1] and
> > > reduce load on the HA backend. Users were able to opt-out and switch
> back
> > > to the old implementation [2].
> > >
> > > The new approach was kind of complicated to implement while still
> > > maintaining support for the old implementation through the existing
> > > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > > Flink 1.16. This enables us to clean things up in the
> > > HighAvailabilityServices.
> > >
> > > The proposed change would mean touching the HighAvailabilityServices
> > > interface. Currently, the interface provides factory methods for
> > > LeaderElectionServices of the aforementioned components. All of these
> > > LeaderElectionServices are internally based on the same LeaderElection
> > > instance handled in DefaultMultipleComponentLeaderElectionService.
> > > Therefore, we can replace all these factory methods by a single one
> which
> > > returns a LeaderElectionService instance that’s going to be used by all
> > > components. Of course, we could also stick to the old
> > > HighAvailabilityServices and return the same LeaderElectionService
> > instance
> > > through each of the four factory methods (similar to what’s done now
> with
> > > the MultipleComponentLeaderElectionService).
> > >
> > > A similar question appears for the corresponding
> LeaderRetrievalService:
> > We
> > > could create a single listener instead of having individual
> per-component
> > > listeners to reflect the current requirement of having a per-JM-process
> > > leader election and align it with the LeaderElectionService approach
> (if
> > we
> > > decide on modifying the HA interface).
> > >
> > > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
> not
> > > considered a public interface. Still, I am aware it might affect users
> > > (e.g. if they implemented their own HA services or if the project
> > monitors
> > > HA information in the HA backend outside of Flink). That’s why I wanted
> > to
> > > start a discussion here. I’m happy to create a FLIP, if someone thinks
> > it’s
> > > worth it. The work is going to be covered by FLINK-26522 [4]
> > >
> > > Pro’s (for changing the interface methods):
> > >
> > >    -
> > >
> > >    It reflects the requirements stated in FLINK-24038 [1] about having
> a
> > >    per-JM-process LeaderElection
> > >    -
> > >
> > >    It helps reducing the complexity of the JobManager
> > >
> > > Con’s:
> > >
> > >    -
> > >
> > >    We lose some flexibility in terms of per-component LeaderElection
> > >    -
> > >
> > >    Interface change might affect other projects that customize HA
> > services
> > >
> > >
> > > I’m in favor of reducing the amount of factory methods in
> > > HighAvailabilityServices considering that it’s not a public interface.
> > I’m
> > > looking forward to your opinions.
> > >
> > > Matthias
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > >
> > > [2]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-25806
> > >
> > > [4] https://issues.apache.org/jira/browse/FLINK-26522
> > >
> > >
> > > --
> > >
> > > [image: Aiven]
> > >
> > > Matthias Pohl
> > >
> > > Software Engineer, Aiven
> > >
> > > matthias.pohl@aiven.io <in...@aiven.io>
> > >
> > > aiven.io <https://www.aiven.io>   |
> > > <https://www.facebook.com/aivencloud> <
> > > https://www.facebook.com/aivencloud/>
> > >     <https://www.linkedin.com/company/aiven/>
> > > <https://www.linkedin.com/company/aiven>    <
> > https://twitter.com/aiven_io>
> > > <https://twitter.com/aiven_io>
> > >
> > > Aiven Deutschland GmbH
> > >
> > > Immanuelkirchstraße 26, 10405 Berlin
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> >
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Dong Lin <li...@gmail.com>.

Hi David,

Thanks for the explanation. Yes, I understand we are figuring out whether
it is OK to remove the existing interfaces. I definitely agree that we
should preserve the capability to eventually split components out to
separate processes.

I guess my question is, suppose we remove the interfaces now, does it
prevent us from doing what we want in the future?

What I am expecting/hoping is that it will just be a matter of adding the
interface back, and the extra efforts of adding this back might be small
compared to all the other work needed to achieve the final goal (e.g. split
our resource manager and job master). If this is the case, maybe it is OK
to unblock other developers from simplifying Flink?

Of course, if there is any active ongoing discussion that touches the same
interface, it makes sense to wait before we change the interface. It
appears that FLIP-257 has been inactive for 3 months. I am not sure how
long we need to wait before being able to remove unnecessary code (based on
the existing supported use-cases) from Flink.

Best,
Dong

On Sat, Dec 10, 2022 at 6:43 PM David Morávek <da...@gmail.com>
wrote:

> Hi Dong,
>
> > Adding regarding the effort to add back the per-component election
> capability: given that the implementation already follows per-process
> election, and given that there will likely be a lot of extra
> design/implementation/test effort needed to achieve the use-cases described
> above, maybe the change proposed in this thread won't affect the overall
> effort much?
>
> This might be a misunderstanding; what Chesnay is proposing is _not
> removing the existing interfaces_, that allow us to split components out to
> separate processes eventually.


> Maybe let's be more explicit about what the current state is:
>
> 1) _HighAvailabilityServices_ interface contains methods to create
> _LeaderElectionService_ and _LeaderRetrievalService_ for each component
> separately
> 2) In 1.15, we've introduced an alternative implementation ->
> _MultipleComponentLeaderElectionService_ that can multicast the leader
> election to multiple components.
> 3) In 1.16, we've removed the old HA services because they didn't provide
> any extra capability beyond what _MultipleComponentLeaderElectionService_
> offers. It indeed did per-component leader election, but it was still
> effectively tied with a single JM process, so adding this back would only
> help a little with the component split efforts.
>
> The biggest motivation for re-factoring from my side would be that it was
> tough to fit the _MultipleComponentLeaderElectionService_ into the existing
> interfaces, so the implementation is unnecessarily complex.
>
> I think what we should do instead is re-thinking these interfaces, so they
> can still provide the flexibility of letting the user split out some
> components into a separate process. There is also a pending discussion
> (FLIP-257 [1]) that hints that some people are already thinking in this
> direction, and it might be required for their use case.


> I've also recently started to incline that splitting out the
> ResourceManager might be crucial for building a large-scale managed
> service. There are a lot of companies emerging in this area right now, so I
> don't feel like we should be closing these doors just yet.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split
>
> Best,
> D.
>
> On Sat, Dec 10, 2022 at 7:01 AM Dong Lin <li...@gmail.com> wrote:
>
> > Hi Matthias,
> >
> > Thanks for the explanation. I was trying to understand the concrete
> > user-facing benefits of preserving the flexibility of per-component
> leader
> > election. Now I get that maybe they want to scale those components
> > independently, and maybe run the UI in an environment that is more
> > accessible
> > than the other processes.
> >
> > I replied to Chesnay's email regarding whether it is worthwhile to keep
> the
> > existing interface for those potential but not-yet-realized benefits.
> >
> > Thanks,
> > Dong
> >
> > On Fri, Dec 9, 2022 at 5:47 PM Matthias Pohl <matthias.pohl@aiven.io
> > .invalid>
> > wrote:
> >
> > > Hi Dong,
> > > see my answers below.
> > >
> > > Regarding "Interface change might affect other projects that customize
> HA
> > > > services", are you referring to those projects which hack into
> Flink's
> > > > source code (as opposed to using Flink's public API) to customize HA
> > > > services?
> > >
> > >
> > > Yes, the proposed change might affect projects that need to have their
> > own
> > > HA implementation for whatever reason (interface change) or if a
> project
> > > accesses the HA backend to retrieve metadata from the ZK node/k8s
> > ConfigMap
> > > (change about how the data is stored in the HA backend). The latter one
> > was
> > > actually already the case with the change introduced by FLINK-24038
> [1].
> > >
> > > By the way, since Flink already supports zookeeper and kubernetes as
> the
> > > > high availability services, are you aware of many projects that still
> > > need
> > > > to hack into Flink's code to customize high availability services?
> > >
> > >
> > > I am aware of projects that use customized HA. But based on our
> > experience
> > > in FLINK-24038 [1] no one complained. So, making people aware through
> the
> > > mailing list might be good enough.
> > >
> > > And regarding "We lose some flexibility in terms of per-component
> > > > LeaderElection", could you explain what flexibility we need so that
> we
> > > can
> > > > gauge the associated downside of losing the flexibility?
> > >
> > >
> > > Just to recap: The current interface allows having per-component
> > > LeaderElection (e.g. the ResourceManager leader can run on a different
> > > JobManager than the Dispatcher). This implementation was replaced by
> > > FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation
> > does
> > > LeaderElection per process (e.g. ResourceManager and Dispatcher always
> > run
> > > on the same JobManager). The changed interface would require us to
> touch
> > > the interface again if (for whatever reason) we want to reintroduce
> > > per-component leader election in some form.
> > > The interface change is, strictly speaking, not necessary to provide
> the
> > > new functionality. But I like the idea of certain requirements
> > (currently,
> > > we need per-process leader election to fix what was reported in
> > FLINK-24038
> > > [1]) being reflected in the interface. This makes sure that we don't
> > > introduce a per-component leader election again accidentally in the
> > future
> > > because we thought it's a good idea but forgot about FLINK-24038.
> > >
> > > Matthias
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > > [2] https://issues.apache.org/jira/browse/FLINK-25806
> > >
> > > On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
> > >
> > > > Hi Matthias,
> > > >
> > > > Thanks for the proposal! Overall I am in favor of making this
> interface
> > > > change to make Flink's codebase more maintainable.
> > > >
> > > > Regarding "Interface change might affect other projects that
> customize
> > HA
> > > > services", are you referring to those projects which hack into
> Flink's
> > > > source code (as opposed to using Flink's public API) to customize HA
> > > > services? If yes, it seems OK to break those projects since we don't
> > have
> > > > any backward compatibility guarantee for those projects.
> > > >
> > > > By the way, since Flink already supports zookeeper and kubernetes as
> > the
> > > > high availability services, are you aware of many projects that still
> > > need
> > > > to hack into Flink's code to customize high availability services?
> > > >
> > > > And regarding "We lose some flexibility in terms of per-component
> > > > LeaderElection", could you explain what flexibility we need so that
> we
> > > can
> > > > gauge the associated downside of losing the flexibility?
> > > >
> > > > Thanks!
> > > > Dong
> > > >
> > > >
> > > >
> > > > On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> > > > .invalid>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > The Flink community introduced a new way how leader election works
> in
> > > > Flink
> > > > > 1.15 with FLINK-24038 [1]. Instead of a per-component leader
> > election,
> > > > all
> > > > > components (i.e. ResourceManager, Dispatcher, REST server,
> JobMaster)
> > > > use a
> > > > > single (per-JM-process) leader election instance. It was meant to
> fix
> > > > some
> > > > > issues with deregistering Flink applications in multi-JM setups [1]
> > and
> > > > > reduce load on the HA backend. Users were able to opt-out and
> switch
> > > back
> > > > > to the old implementation [2].
> > > > >
> > > > > The new approach was kind of complicated to implement while still
> > > > > maintaining support for the old implementation through the existing
> > > > > interfaces. With FLINK-25806 [3], the old implementation was
> removed
> > in
> > > > > Flink 1.16. This enables us to clean things up in the
> > > > > HighAvailabilityServices.
> > > > >
> > > > > The proposed change would mean touching the
> HighAvailabilityServices
> > > > > interface. Currently, the interface provides factory methods for
> > > > > LeaderElectionServices of the aforementioned components. All of
> these
> > > > > LeaderElectionServices are internally based on the same
> > LeaderElection
> > > > > instance handled in DefaultMultipleComponentLeaderElectionService.
> > > > > Therefore, we can replace all these factory methods by a single one
> > > which
> > > > > returns a LeaderElectionService instance that’s going to be used by
> > all
> > > > > components. Of course, we could also stick to the old
> > > > > HighAvailabilityServices and return the same LeaderElectionService
> > > > instance
> > > > > through each of the four factory methods (similar to what’s done
> now
> > > with
> > > > > the MultipleComponentLeaderElectionService).
> > > > >
> > > > > A similar question appears for the corresponding
> > > LeaderRetrievalService:
> > > > We
> > > > > could create a single listener instead of having individual
> > > per-component
> > > > > listeners to reflect the current requirement of having a
> > per-JM-process
> > > > > leader election and align it with the LeaderElectionService
> approach
> > > (if
> > > > we
> > > > > decide on modifying the HA interface).
> > > > >
> > > > > I didn’t come up with a dedicated FLIP: HighAvailabilityServices
> are
> > > not
> > > > > considered a public interface. Still, I am aware it might affect
> > users
> > > > > (e.g. if they implemented their own HA services or if the project
> > > > monitors
> > > > > HA information in the HA backend outside of Flink). That’s why I
> > wanted
> > > > to
> > > > > start a discussion here. I’m happy to create a FLIP, if someone
> > thinks
> > > > it’s
> > > > > worth it. The work is going to be covered by FLINK-26522 [4]
> > > > >
> > > > > Pro’s (for changing the interface methods):
> > > > >
> > > > >    -
> > > > >
> > > > >    It reflects the requirements stated in FLINK-24038 [1] about
> > having
> > > a
> > > > >    per-JM-process LeaderElection
> > > > >    -
> > > > >
> > > > >    It helps reducing the complexity of the JobManager
> > > > >
> > > > > Con’s:
> > > > >
> > > > >    -
> > > > >
> > > > >    We lose some flexibility in terms of per-component
> LeaderElection
> > > > >    -
> > > > >
> > > > >    Interface change might affect other projects that customize HA
> > > > services
> > > > >
> > > > >
> > > > > I’m in favor of reducing the amount of factory methods in
> > > > > HighAvailabilityServices considering that it’s not a public
> > interface.
> > > > I’m
> > > > > looking forward to your opinions.
> > > > >
> > > > > Matthias
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > > > >
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> > > > >
> > > > > [3] https://issues.apache.org/jira/browse/FLINK-25806
> > > > >
> > > > > [4] https://issues.apache.org/jira/browse/FLINK-26522
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > [image: Aiven]
> > > > >
> > > > > Matthias Pohl
> > > > >
> > > > > Software Engineer, Aiven
> > > > >
> > > > > matthias.pohl@aiven.io <in...@aiven.io>
> > > > >
> > > > > aiven.io <https://www.aiven.io>   |
> > > > > <https://www.facebook.com/aivencloud> <
> > > > > https://www.facebook.com/aivencloud/>
> > > > >     <https://www.linkedin.com/company/aiven/>
> > > > > <https://www.linkedin.com/company/aiven>    <
> > > > https://twitter.com/aiven_io>
> > > > > <https://twitter.com/aiven_io>
> > > > >
> > > > > Aiven Deutschland GmbH
> > > > >
> > > > > Immanuelkirchstraße 26, 10405 Berlin
> > > > >
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > >
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by David Morávek <da...@gmail.com>.

Hi Dong,

> Adding regarding the effort to add back the per-component election
capability: given that the implementation already follows per-process
election, and given that there will likely be a lot of extra
design/implementation/test effort needed to achieve the use-cases described
above, maybe the change proposed in this thread won't affect the overall
effort much?

This might be a misunderstanding; what Chesnay is proposing is _not
removing the existing interfaces_, that allow us to split components out to
separate processes eventually.

Maybe let's be more explicit about what the current state is:

1) _HighAvailabilityServices_ interface contains methods to create
_LeaderElectionService_ and _LeaderRetrievalService_ for each component
separately
2) In 1.15, we've introduced an alternative implementation ->
_MultipleComponentLeaderElectionService_ that can multicast the leader
election to multiple components.
3) In 1.16, we've removed the old HA services because they didn't provide
any extra capability beyond what _MultipleComponentLeaderElectionService_
offers. It indeed did per-component leader election, but it was still
effectively tied with a single JM process, so adding this back would only
help a little with the component split efforts.

The biggest motivation for re-factoring from my side would be that it was
tough to fit the _MultipleComponentLeaderElectionService_ into the existing
interfaces, so the implementation is unnecessarily complex.

I think what we should do instead is re-thinking these interfaces, so they
can still provide the flexibility of letting the user split out some
components into a separate process. There is also a pending discussion
(FLIP-257 [1]) that hints that some people are already thinking in this
direction, and it might be required for their use case.

I've also recently started to incline that splitting out the
ResourceManager might be crucial for building a large-scale managed
service. There are a lot of companies emerging in this area right now, so I
don't feel like we should be closing these doors just yet.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-257%3A+Flink+JobManager+Process+Split

Best,
D.

On Sat, Dec 10, 2022 at 7:01 AM Dong Lin <li...@gmail.com> wrote:

> Hi Matthias,
>
> Thanks for the explanation. I was trying to understand the concrete
> user-facing benefits of preserving the flexibility of per-component leader
> election. Now I get that maybe they want to scale those components
> independently, and maybe run the UI in an environment that is more
> accessible
> than the other processes.
>
> I replied to Chesnay's email regarding whether it is worthwhile to keep the
> existing interface for those potential but not-yet-realized benefits.
>
> Thanks,
> Dong
>
> On Fri, Dec 9, 2022 at 5:47 PM Matthias Pohl <matthias.pohl@aiven.io
> .invalid>
> wrote:
>
> > Hi Dong,
> > see my answers below.
> >
> > Regarding "Interface change might affect other projects that customize HA
> > > services", are you referring to those projects which hack into Flink's
> > > source code (as opposed to using Flink's public API) to customize HA
> > > services?
> >
> >
> > Yes, the proposed change might affect projects that need to have their
> own
> > HA implementation for whatever reason (interface change) or if a project
> > accesses the HA backend to retrieve metadata from the ZK node/k8s
> ConfigMap
> > (change about how the data is stored in the HA backend). The latter one
> was
> > actually already the case with the change introduced by FLINK-24038 [1].
> >
> > By the way, since Flink already supports zookeeper and kubernetes as the
> > > high availability services, are you aware of many projects that still
> > need
> > > to hack into Flink's code to customize high availability services?
> >
> >
> > I am aware of projects that use customized HA. But based on our
> experience
> > in FLINK-24038 [1] no one complained. So, making people aware through the
> > mailing list might be good enough.
> >
> > And regarding "We lose some flexibility in terms of per-component
> > > LeaderElection", could you explain what flexibility we need so that we
> > can
> > > gauge the associated downside of losing the flexibility?
> >
> >
> > Just to recap: The current interface allows having per-component
> > LeaderElection (e.g. the ResourceManager leader can run on a different
> > JobManager than the Dispatcher). This implementation was replaced by
> > FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation
> does
> > LeaderElection per process (e.g. ResourceManager and Dispatcher always
> run
> > on the same JobManager). The changed interface would require us to touch
> > the interface again if (for whatever reason) we want to reintroduce
> > per-component leader election in some form.
> > The interface change is, strictly speaking, not necessary to provide the
> > new functionality. But I like the idea of certain requirements
> (currently,
> > we need per-process leader election to fix what was reported in
> FLINK-24038
> > [1]) being reflected in the interface. This makes sure that we don't
> > introduce a per-component leader election again accidentally in the
> future
> > because we thought it's a good idea but forgot about FLINK-24038.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > [2] https://issues.apache.org/jira/browse/FLINK-25806
> >
> > On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
> >
> > > Hi Matthias,
> > >
> > > Thanks for the proposal! Overall I am in favor of making this interface
> > > change to make Flink's codebase more maintainable.
> > >
> > > Regarding "Interface change might affect other projects that customize
> HA
> > > services", are you referring to those projects which hack into Flink's
> > > source code (as opposed to using Flink's public API) to customize HA
> > > services? If yes, it seems OK to break those projects since we don't
> have
> > > any backward compatibility guarantee for those projects.
> > >
> > > By the way, since Flink already supports zookeeper and kubernetes as
> the
> > > high availability services, are you aware of many projects that still
> > need
> > > to hack into Flink's code to customize high availability services?
> > >
> > > And regarding "We lose some flexibility in terms of per-component
> > > LeaderElection", could you explain what flexibility we need so that we
> > can
> > > gauge the associated downside of losing the flexibility?
> > >
> > > Thanks!
> > > Dong
> > >
> > >
> > >
> > > On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> > > .invalid>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > The Flink community introduced a new way how leader election works in
> > > Flink
> > > > 1.15 with FLINK-24038 [1]. Instead of a per-component leader
> election,
> > > all
> > > > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> > > use a
> > > > single (per-JM-process) leader election instance. It was meant to fix
> > > some
> > > > issues with deregistering Flink applications in multi-JM setups [1]
> and
> > > > reduce load on the HA backend. Users were able to opt-out and switch
> > back
> > > > to the old implementation [2].
> > > >
> > > > The new approach was kind of complicated to implement while still
> > > > maintaining support for the old implementation through the existing
> > > > interfaces. With FLINK-25806 [3], the old implementation was removed
> in
> > > > Flink 1.16. This enables us to clean things up in the
> > > > HighAvailabilityServices.
> > > >
> > > > The proposed change would mean touching the HighAvailabilityServices
> > > > interface. Currently, the interface provides factory methods for
> > > > LeaderElectionServices of the aforementioned components. All of these
> > > > LeaderElectionServices are internally based on the same
> LeaderElection
> > > > instance handled in DefaultMultipleComponentLeaderElectionService.
> > > > Therefore, we can replace all these factory methods by a single one
> > which
> > > > returns a LeaderElectionService instance that’s going to be used by
> all
> > > > components. Of course, we could also stick to the old
> > > > HighAvailabilityServices and return the same LeaderElectionService
> > > instance
> > > > through each of the four factory methods (similar to what’s done now
> > with
> > > > the MultipleComponentLeaderElectionService).
> > > >
> > > > A similar question appears for the corresponding
> > LeaderRetrievalService:
> > > We
> > > > could create a single listener instead of having individual
> > per-component
> > > > listeners to reflect the current requirement of having a
> per-JM-process
> > > > leader election and align it with the LeaderElectionService approach
> > (if
> > > we
> > > > decide on modifying the HA interface).
> > > >
> > > > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
> > not
> > > > considered a public interface. Still, I am aware it might affect
> users
> > > > (e.g. if they implemented their own HA services or if the project
> > > monitors
> > > > HA information in the HA backend outside of Flink). That’s why I
> wanted
> > > to
> > > > start a discussion here. I’m happy to create a FLIP, if someone
> thinks
> > > it’s
> > > > worth it. The work is going to be covered by FLINK-26522 [4]
> > > >
> > > > Pro’s (for changing the interface methods):
> > > >
> > > >    -
> > > >
> > > >    It reflects the requirements stated in FLINK-24038 [1] about
> having
> > a
> > > >    per-JM-process LeaderElection
> > > >    -
> > > >
> > > >    It helps reducing the complexity of the JobManager
> > > >
> > > > Con’s:
> > > >
> > > >    -
> > > >
> > > >    We lose some flexibility in terms of per-component LeaderElection
> > > >    -
> > > >
> > > >    Interface change might affect other projects that customize HA
> > > services
> > > >
> > > >
> > > > I’m in favor of reducing the amount of factory methods in
> > > > HighAvailabilityServices considering that it’s not a public
> interface.
> > > I’m
> > > > looking forward to your opinions.
> > > >
> > > > Matthias
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > > >
> > > > [2]
> > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> > > >
> > > > [3] https://issues.apache.org/jira/browse/FLINK-25806
> > > >
> > > > [4] https://issues.apache.org/jira/browse/FLINK-26522
> > > >
> > > >
> > > > --
> > > >
> > > > [image: Aiven]
> > > >
> > > > Matthias Pohl
> > > >
> > > > Software Engineer, Aiven
> > > >
> > > > matthias.pohl@aiven.io <in...@aiven.io>
> > > >
> > > > aiven.io <https://www.aiven.io>   |
> > > > <https://www.facebook.com/aivencloud> <
> > > > https://www.facebook.com/aivencloud/>
> > > >     <https://www.linkedin.com/company/aiven/>
> > > > <https://www.linkedin.com/company/aiven>    <
> > > https://twitter.com/aiven_io>
> > > > <https://twitter.com/aiven_io>
> > > >
> > > > Aiven Deutschland GmbH
> > > >
> > > > Immanuelkirchstraße 26, 10405 Berlin
> > > >
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > >
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > >
> >
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Dong Lin <li...@gmail.com>.

Hi Matthias,

Thanks for the explanation. I was trying to understand the concrete
user-facing benefits of preserving the flexibility of per-component leader
election. Now I get that maybe they want to scale those components
independently, and maybe run the UI in an environment that is more accessible
than the other processes.

I replied to Chesnay's email regarding whether it is worthwhile to keep the
existing interface for those potential but not-yet-realized benefits.

Thanks,
Dong

On Fri, Dec 9, 2022 at 5:47 PM Matthias Pohl <ma...@aiven.io.invalid>
wrote:

> Hi Dong,
> see my answers below.
>
> Regarding "Interface change might affect other projects that customize HA
> > services", are you referring to those projects which hack into Flink's
> > source code (as opposed to using Flink's public API) to customize HA
> > services?
>
>
> Yes, the proposed change might affect projects that need to have their own
> HA implementation for whatever reason (interface change) or if a project
> accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
> (change about how the data is stored in the HA backend). The latter one was
> actually already the case with the change introduced by FLINK-24038 [1].
>
> By the way, since Flink already supports zookeeper and kubernetes as the
> > high availability services, are you aware of many projects that still
> need
> > to hack into Flink's code to customize high availability services?
>
>
> I am aware of projects that use customized HA. But based on our experience
> in FLINK-24038 [1] no one complained. So, making people aware through the
> mailing list might be good enough.
>
> And regarding "We lose some flexibility in terms of per-component
> > LeaderElection", could you explain what flexibility we need so that we
> can
> > gauge the associated downside of losing the flexibility?
>
>
> Just to recap: The current interface allows having per-component
> LeaderElection (e.g. the ResourceManager leader can run on a different
> JobManager than the Dispatcher). This implementation was replaced by
> FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
> LeaderElection per process (e.g. ResourceManager and Dispatcher always run
> on the same JobManager). The changed interface would require us to touch
> the interface again if (for whatever reason) we want to reintroduce
> per-component leader election in some form.
> The interface change is, strictly speaking, not necessary to provide the
> new functionality. But I like the idea of certain requirements (currently,
> we need per-process leader election to fix what was reported in FLINK-24038
> [1]) being reflected in the interface. This makes sure that we don't
> introduce a per-component leader election again accidentally in the future
> because we thought it's a good idea but forgot about FLINK-24038.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-24038
> [2] https://issues.apache.org/jira/browse/FLINK-25806
>
> On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:
>
> > Hi Matthias,
> >
> > Thanks for the proposal! Overall I am in favor of making this interface
> > change to make Flink's codebase more maintainable.
> >
> > Regarding "Interface change might affect other projects that customize HA
> > services", are you referring to those projects which hack into Flink's
> > source code (as opposed to using Flink's public API) to customize HA
> > services? If yes, it seems OK to break those projects since we don't have
> > any backward compatibility guarantee for those projects.
> >
> > By the way, since Flink already supports zookeeper and kubernetes as the
> > high availability services, are you aware of many projects that still
> need
> > to hack into Flink's code to customize high availability services?
> >
> > And regarding "We lose some flexibility in terms of per-component
> > LeaderElection", could you explain what flexibility we need so that we
> can
> > gauge the associated downside of losing the flexibility?
> >
> > Thanks!
> > Dong
> >
> >
> >
> > On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> > .invalid>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > The Flink community introduced a new way how leader election works in
> > Flink
> > > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> > all
> > > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> > use a
> > > single (per-JM-process) leader election instance. It was meant to fix
> > some
> > > issues with deregistering Flink applications in multi-JM setups [1] and
> > > reduce load on the HA backend. Users were able to opt-out and switch
> back
> > > to the old implementation [2].
> > >
> > > The new approach was kind of complicated to implement while still
> > > maintaining support for the old implementation through the existing
> > > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > > Flink 1.16. This enables us to clean things up in the
> > > HighAvailabilityServices.
> > >
> > > The proposed change would mean touching the HighAvailabilityServices
> > > interface. Currently, the interface provides factory methods for
> > > LeaderElectionServices of the aforementioned components. All of these
> > > LeaderElectionServices are internally based on the same LeaderElection
> > > instance handled in DefaultMultipleComponentLeaderElectionService.
> > > Therefore, we can replace all these factory methods by a single one
> which
> > > returns a LeaderElectionService instance that’s going to be used by all
> > > components. Of course, we could also stick to the old
> > > HighAvailabilityServices and return the same LeaderElectionService
> > instance
> > > through each of the four factory methods (similar to what’s done now
> with
> > > the MultipleComponentLeaderElectionService).
> > >
> > > A similar question appears for the corresponding
> LeaderRetrievalService:
> > We
> > > could create a single listener instead of having individual
> per-component
> > > listeners to reflect the current requirement of having a per-JM-process
> > > leader election and align it with the LeaderElectionService approach
> (if
> > we
> > > decide on modifying the HA interface).
> > >
> > > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are
> not
> > > considered a public interface. Still, I am aware it might affect users
> > > (e.g. if they implemented their own HA services or if the project
> > monitors
> > > HA information in the HA backend outside of Flink). That’s why I wanted
> > to
> > > start a discussion here. I’m happy to create a FLIP, if someone thinks
> > it’s
> > > worth it. The work is going to be covered by FLINK-26522 [4]
> > >
> > > Pro’s (for changing the interface methods):
> > >
> > >    -
> > >
> > >    It reflects the requirements stated in FLINK-24038 [1] about having
> a
> > >    per-JM-process LeaderElection
> > >    -
> > >
> > >    It helps reducing the complexity of the JobManager
> > >
> > > Con’s:
> > >
> > >    -
> > >
> > >    We lose some flexibility in terms of per-component LeaderElection
> > >    -
> > >
> > >    Interface change might affect other projects that customize HA
> > services
> > >
> > >
> > > I’m in favor of reducing the amount of factory methods in
> > > HighAvailabilityServices considering that it’s not a public interface.
> > I’m
> > > looking forward to your opinions.
> > >
> > > Matthias
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-24038
> > >
> > > [2]
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-25806
> > >
> > > [4] https://issues.apache.org/jira/browse/FLINK-26522
> > >
> > >
> > > --
> > >
> > > [image: Aiven]
> > >
> > > Matthias Pohl
> > >
> > > Software Engineer, Aiven
> > >
> > > matthias.pohl@aiven.io <in...@aiven.io>
> > >
> > > aiven.io <https://www.aiven.io>   |
> > > <https://www.facebook.com/aivencloud> <
> > > https://www.facebook.com/aivencloud/>
> > >     <https://www.linkedin.com/company/aiven/>
> > > <https://www.linkedin.com/company/aiven>    <
> > https://twitter.com/aiven_io>
> > > <https://twitter.com/aiven_io>
> > >
> > > Aiven Deutschland GmbH
> > >
> > > Immanuelkirchstraße 26, 10405 Berlin
> > >
> > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > >
> > > Amtsgericht Charlottenburg, HRB 209739 B
> > >
> >
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Matthias Pohl <ma...@aiven.io.INVALID>.

Hi Dong,
see my answers below.

Regarding "Interface change might affect other projects that customize HA
> services", are you referring to those projects which hack into Flink's
> source code (as opposed to using Flink's public API) to customize HA
> services?


Yes, the proposed change might affect projects that need to have their own
HA implementation for whatever reason (interface change) or if a project
accesses the HA backend to retrieve metadata from the ZK node/k8s ConfigMap
(change about how the data is stored in the HA backend). The latter one was
actually already the case with the change introduced by FLINK-24038 [1].

By the way, since Flink already supports zookeeper and kubernetes as the
> high availability services, are you aware of many projects that still need
> to hack into Flink's code to customize high availability services?


I am aware of projects that use customized HA. But based on our experience
in FLINK-24038 [1] no one complained. So, making people aware through the
mailing list might be good enough.

And regarding "We lose some flexibility in terms of per-component
> LeaderElection", could you explain what flexibility we need so that we can
> gauge the associated downside of losing the flexibility?


Just to recap: The current interface allows having per-component
LeaderElection (e.g. the ResourceManager leader can run on a different
JobManager than the Dispatcher). This implementation was replaced by
FLINK-24038 [1] and removed in FLINK-25806 [2]. The new implementation does
LeaderElection per process (e.g. ResourceManager and Dispatcher always run
on the same JobManager). The changed interface would require us to touch
the interface again if (for whatever reason) we want to reintroduce
per-component leader election in some form.
The interface change is, strictly speaking, not necessary to provide the
new functionality. But I like the idea of certain requirements (currently,
we need per-process leader election to fix what was reported in FLINK-24038
[1]) being reflected in the interface. This makes sure that we don't
introduce a per-component leader election again accidentally in the future
because we thought it's a good idea but forgot about FLINK-24038.

Matthias

[1] https://issues.apache.org/jira/browse/FLINK-24038
[2] https://issues.apache.org/jira/browse/FLINK-25806

On Fri, Dec 9, 2022 at 2:09 AM Dong Lin <li...@gmail.com> wrote:

> Hi Matthias,
>
> Thanks for the proposal! Overall I am in favor of making this interface
> change to make Flink's codebase more maintainable.
>
> Regarding "Interface change might affect other projects that customize HA
> services", are you referring to those projects which hack into Flink's
> source code (as opposed to using Flink's public API) to customize HA
> services? If yes, it seems OK to break those projects since we don't have
> any backward compatibility guarantee for those projects.
>
> By the way, since Flink already supports zookeeper and kubernetes as the
> high availability services, are you aware of many projects that still need
> to hack into Flink's code to customize high availability services?
>
> And regarding "We lose some flexibility in terms of per-component
> LeaderElection", could you explain what flexibility we need so that we can
> gauge the associated downside of losing the flexibility?
>
> Thanks!
> Dong
>
>
>
> On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <matthias.pohl@aiven.io
> .invalid>
> wrote:
>
> > Hi everyone,
> >
> > The Flink community introduced a new way how leader election works in
> Flink
> > 1.15 with FLINK-24038 [1]. Instead of a per-component leader election,
> all
> > components (i.e. ResourceManager, Dispatcher, REST server, JobMaster)
> use a
> > single (per-JM-process) leader election instance. It was meant to fix
> some
> > issues with deregistering Flink applications in multi-JM setups [1] and
> > reduce load on the HA backend. Users were able to opt-out and switch back
> > to the old implementation [2].
> >
> > The new approach was kind of complicated to implement while still
> > maintaining support for the old implementation through the existing
> > interfaces. With FLINK-25806 [3], the old implementation was removed in
> > Flink 1.16. This enables us to clean things up in the
> > HighAvailabilityServices.
> >
> > The proposed change would mean touching the HighAvailabilityServices
> > interface. Currently, the interface provides factory methods for
> > LeaderElectionServices of the aforementioned components. All of these
> > LeaderElectionServices are internally based on the same LeaderElection
> > instance handled in DefaultMultipleComponentLeaderElectionService.
> > Therefore, we can replace all these factory methods by a single one which
> > returns a LeaderElectionService instance that’s going to be used by all
> > components. Of course, we could also stick to the old
> > HighAvailabilityServices and return the same LeaderElectionService
> instance
> > through each of the four factory methods (similar to what’s done now with
> > the MultipleComponentLeaderElectionService).
> >
> > A similar question appears for the corresponding LeaderRetrievalService:
> We
> > could create a single listener instead of having individual per-component
> > listeners to reflect the current requirement of having a per-JM-process
> > leader election and align it with the LeaderElectionService approach (if
> we
> > decide on modifying the HA interface).
> >
> > I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> > considered a public interface. Still, I am aware it might affect users
> > (e.g. if they implemented their own HA services or if the project
> monitors
> > HA information in the HA backend outside of Flink). That’s why I wanted
> to
> > start a discussion here. I’m happy to create a FLIP, if someone thinks
> it’s
> > worth it. The work is going to be covered by FLINK-26522 [4]
> >
> > Pro’s (for changing the interface methods):
> >
> >    -
> >
> >    It reflects the requirements stated in FLINK-24038 [1] about having a
> >    per-JM-process LeaderElection
> >    -
> >
> >    It helps reducing the complexity of the JobManager
> >
> > Con’s:
> >
> >    -
> >
> >    We lose some flexibility in terms of per-component LeaderElection
> >    -
> >
> >    Interface change might affect other projects that customize HA
> services
> >
> >
> > I’m in favor of reducing the amount of factory methods in
> > HighAvailabilityServices considering that it’s not a public interface.
> I’m
> > looking forward to your opinions.
> >
> > Matthias
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-24038
> >
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-25806
> >
> > [4] https://issues.apache.org/jira/browse/FLINK-26522
> >
> >
> > --
> >
> > [image: Aiven]
> >
> > Matthias Pohl
> >
> > Software Engineer, Aiven
> >
> > matthias.pohl@aiven.io <in...@aiven.io>
> >
> > aiven.io <https://www.aiven.io>   |
> > <https://www.facebook.com/aivencloud> <
> > https://www.facebook.com/aivencloud/>
> >     <https://www.linkedin.com/company/aiven/>
> > <https://www.linkedin.com/company/aiven>    <
> https://twitter.com/aiven_io>
> > <https://twitter.com/aiven_io>
> >
> > Aiven Deutschland GmbH
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
>

Re: [DISCUSS] Cleaning up HighAvailabilityServices interface to reflect the per-JM-process LeaderElection

Posted by Dong Lin <li...@gmail.com>.

Hi Matthias,

Thanks for the proposal! Overall I am in favor of making this interface
change to make Flink's codebase more maintainable.

Regarding "Interface change might affect other projects that customize HA
services", are you referring to those projects which hack into Flink's
source code (as opposed to using Flink's public API) to customize HA
services? If yes, it seems OK to break those projects since we don't have
any backward compatibility guarantee for those projects.

By the way, since Flink already supports zookeeper and kubernetes as the
high availability services, are you aware of many projects that still need
to hack into Flink's code to customize high availability services?

And regarding "We lose some flexibility in terms of per-component
LeaderElection", could you explain what flexibility we need so that we can
gauge the associated downside of losing the flexibility?

Thanks!
Dong



On Wed, Dec 7, 2022 at 4:28 PM Matthias Pohl <ma...@aiven.io.invalid>
wrote:

> Hi everyone,
>
> The Flink community introduced a new way how leader election works in Flink
> 1.15 with FLINK-24038 [1]. Instead of a per-component leader election, all
> components (i.e. ResourceManager, Dispatcher, REST server, JobMaster) use a
> single (per-JM-process) leader election instance. It was meant to fix some
> issues with deregistering Flink applications in multi-JM setups [1] and
> reduce load on the HA backend. Users were able to opt-out and switch back
> to the old implementation [2].
>
> The new approach was kind of complicated to implement while still
> maintaining support for the old implementation through the existing
> interfaces. With FLINK-25806 [3], the old implementation was removed in
> Flink 1.16. This enables us to clean things up in the
> HighAvailabilityServices.
>
> The proposed change would mean touching the HighAvailabilityServices
> interface. Currently, the interface provides factory methods for
> LeaderElectionServices of the aforementioned components. All of these
> LeaderElectionServices are internally based on the same LeaderElection
> instance handled in DefaultMultipleComponentLeaderElectionService.
> Therefore, we can replace all these factory methods by a single one which
> returns a LeaderElectionService instance that’s going to be used by all
> components. Of course, we could also stick to the old
> HighAvailabilityServices and return the same LeaderElectionService instance
> through each of the four factory methods (similar to what’s done now with
> the MultipleComponentLeaderElectionService).
>
> A similar question appears for the corresponding LeaderRetrievalService: We
> could create a single listener instead of having individual per-component
> listeners to reflect the current requirement of having a per-JM-process
> leader election and align it with the LeaderElectionService approach (if we
> decide on modifying the HA interface).
>
> I didn’t come up with a dedicated FLIP: HighAvailabilityServices are not
> considered a public interface. Still, I am aware it might affect users
> (e.g. if they implemented their own HA services or if the project monitors
> HA information in the HA backend outside of Flink). That’s why I wanted to
> start a discussion here. I’m happy to create a FLIP, if someone thinks it’s
> worth it. The work is going to be covered by FLINK-26522 [4]
>
> Pro’s (for changing the interface methods):
>
>    -
>
>    It reflects the requirements stated in FLINK-24038 [1] about having a
>    per-JM-process LeaderElection
>    -
>
>    It helps reducing the complexity of the JobManager
>
> Con’s:
>
>    -
>
>    We lose some flexibility in terms of per-component LeaderElection
>    -
>
>    Interface change might affect other projects that customize HA services
>
>
> I’m in favor of reducing the amount of factory methods in
> HighAvailabilityServices considering that it’s not a public interface. I’m
> looking forward to your opinions.
>
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-24038
>
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#high-availability-use-old-ha-services
>
> [3] https://issues.apache.org/jira/browse/FLINK-25806
>
> [4] https://issues.apache.org/jira/browse/FLINK-26522
>
>
> --
>
> [image: Aiven]
>
> Matthias Pohl
>
> Software Engineer, Aiven
>
> matthias.pohl@aiven.io <in...@aiven.io>
>
> aiven.io <https://www.aiven.io>   |
> <https://www.facebook.com/aivencloud> <
> https://www.facebook.com/aivencloud/>
>     <https://www.linkedin.com/company/aiven/>
> <https://www.linkedin.com/company/aiven>    <https://twitter.com/aiven_io>
> <https://twitter.com/aiven_io>
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> Amtsgericht Charlottenburg, HRB 209739 B
>