You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Thomas Weise <th...@apache.org> on 2022/03/01 01:16:05 UTC

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Thanks for bringing the discussion. It's a good time to revisit this
area as the operator implementation becomes more complex.

I think the most important part of the design are well defined states
and transitions. Unless that can be reasoned about, there will be
confusion. For example:

1) For job upgrade, it wasn't clear in which state reconciliation
should happen and as a result the implementation ended up incomplete.
2) "JobStatusObserver" would attempt to list jobs before the REST API
is ready. And it would be used in session mode, even though the
session mode needs a different state machine.

The implementation currently contains a lot of if-then-else logic,
which is hard to follow and difficult to maintain. It will make it
harder to introduce new states that would be necessary to implement
more advanced upgrade strategies, amongst others.

Did you consider a state machine centric approach? See [1] for example.

As Biao mentioned, observe -> reconcile may repeat and different
states will require different checks. Once a job (in job mode) is
running, there is no need to check the JM deployment etc. A monolithic
observer may not work so well for that. Rather, I think different
states have different monitoring needs that inform transitions,
including the actual state and changes made to the CR.

It would also be good if the current state is directly reflected in
the CR status so that it is easier to check where the deployment is
at.

Cheers,
Thomas


[1] https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md

On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com> wrote:
>
> Hi All!
>
> Thank you for the feedback, I agree with what has been proposed to include
> as much as possible in the actual resource status and make the reconciler
> completely independent from the observer.
>
> @Biao Geng:
> Validation could depend on the current status (depending on how we
> implement the validation logic) so it might always be necessary (and it is
> also cheap).
> What you are saying with multiple observe -> reconcile cycles ties back to
> what Matyas said, that we should probably have an Observe loop until we
> have a stable state ready for reconciliation, then reconcile once.
>
> So probably validate -> observe until stable -> reconcile -> observe until
> stable
>
> Cheers,
> Gyula
>
> On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com> wrote:
>
> > Hi Gyula,
> > Thanks for the discussion. It also makes senses to me on the separation of
> > 3 components and Yang's proposal.
> > Just 1 follow-up thought after checking your PR: in the reconcile()
> > method of controller, IIUC, the real flow could be
> > `validate->observe->reconcile->validate->observe->reconcile...". The
> > validation phase seems to be required only when the creation of the job
> > cluster and the upgrade of config. For phases like waiting the JM from
> > deploying to ready, it is not mandatory and thus the flow can look like
> > `validate->observe->reconcile->optional validate due to current
> > state->observe->reconcile...`
> >
> > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
> >
> > > It is worth looking at the controller code in the spotify operator too:
> > >
> > >
> > https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
> > >
> > > It is looping in the 'observer phase' until it reaches a stable state,
> > then
> > > it performs the necessary changes.
> > >
> > > Based on this I also suggest keeping the logic in separate
> > > modules(Validate->Observe->Reconcile). The three components might not
> > > even be enough as we add more and more complexity to the code.
> > >
> > > Cheers,
> > > Matyas
> > >
> > >
> > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com> wrote:
> > >
> > > > Hi, Gyula
> > > >       Thanks for driving this discussion. I second Yang Wang's idea
> > that
> > > > it's better to make the `validator`, `observer` and `reconciler`
> > > > self-contained. I also prefer to define the `Observer` as an interface
> > > and
> > > > we could define the statuses that `Observer` will expose. It acts like
> > > the
> > > > observer protocol between the `Observer` and `Reconciler`.
> > > >
> > > > Best,
> > > > Aitozi.
> > > >
> > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
> > > >
> > > > > Thanks for posting the discussion here.
> > > > >
> > > > >
> > > > > Having the components `validator` `observer` `reconciler` makes lots
> > of
> > > > > sense. And the "Validate -> Observe -> Reconcile"
> > > > > flow seems natural to me.
> > > > >
> > > > > Regarding the implementation in the PR, instead of directly using the
> > > > > observer in the reconciler, I lean to let the observer
> > > > > exports the results to the status(e.g. jobmanager deployment status,
> > > rest
> > > > > port readiness, flink jobs status, etc.) and
> > > > > the reconciler reads it from the status. Then each component is more
> > > > > self-contained and the boundary will be clearer.
> > > > >
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
> > > > >
> > > > > > Hi All!
> > > > > >
> > > > > > I would like to start a discussion thread regarding the structure
> > of
> > > > > > the Kubernetes
> > > > > > Operator
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> > > > > > >
> > > > > > controller
> > > > > > flow. Based on some recent PR discussions we have no clear
> > consensus
> > > on
> > > > > the
> > > > > > structure and the expectations which can potentially lead to back
> > and
> > > > > forth
> > > > > > changes and unnecessary complexity.
> > > > > >
> > > > > > *Background*
> > > > > > In the initial prototype we had a very basic flow:
> > > > > >  1. Observe flink job status
> > > > > >  2. (if observation successful) reconcile changes
> > > > > >  3. Reschedule reconcile with success/error
> > > > > >
> > > > > > This basic prototype flow could not cover all requirements and did
> > > not
> > > > > > allow for things like waiting until Jobmanager deployment is ready
> > > etc.
> > > > > >
> > > > > > To solve these shortcomings, some changes were introduced recently
> > > here
> > > > > > <https://github.com/apache/flink-kubernetes-operator/pull/21>.
> > While
> > > > > this
> > > > > > change introduced many improvements and safeguards it also
> > completely
> > > > > > changed the original controller flow. Now the reconciler is
> > > responsible
> > > > > for
> > > > > > ensuring that it can actually reconcile by checking the deployment
> > > and
> > > > > > ports. The job status observation logic has also been moved into
> > the
> > > > > actual
> > > > > > reconcile logic.
> > > > > >
> > > > > >
> > > > > > *Discussion Question*What controller flow would we like to have? Do
> > > we
> > > > > want
> > > > > > to separate the observer from the reconciler or keep them together?
> > > > > >
> > > > > > In my personal view, we should try to adopt a very simple flow to
> > > make
> > > > > the
> > > > > > operator clean and modular. If possible I would like to restore the
> > > > > > original flow with some modifications:
> > > > > >
> > > > > >  1. Validate deployment object
> > > > > >  2. Observe deployment and flink job status -> Return comprehensive
> > > > > status
> > > > > > info
> > > > > >  3. Reconcile deployment based on observed status and resource
> > > changes
> > > > > >  (Both 2/3 should be able to reschedule immediately if necessary)
> > > > > >
> > > > > > I think the Observer component should be able to describe the
> > current
> > > > > > status of the deployment objects and the flink job to the extent
> > that
> > > > the
> > > > > > reconciler can work with that information alone. If we do it this
> > > way,
> > > > we
> > > > > > can also use the status information that the observer provides to
> > > > produce
> > > > > > other events and aid operations like shutdown which depend on the
> > > > current
> > > > > > deployment status.
> > > > > >
> > > > > > I think this would satisfy our needs, but I might be missing
> > > something
> > > > > that
> > > > > > cannot be done if we structure the code this way.
> > > > > >
> > > > > > I have a PR open
> > > > > > <
> > https://github.com/apache/flink-kubernetes-operator/pull/26/commits
> > > >
> > > > > > which
> > > > > > includes some of these proposed changes (as the optional second
> > > commit)
> > > > > so
> > > > > > that you can easily compare with the current state of the operator.
> > > > > >
> > > > > > Please let us know what we think!
> > > > > >
> > > > > > Cheers,
> > > > > > Gyula
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Posted by Gyula Fóra <gy...@gmail.com>.

Let's keep this discussion for another day so everyone can still chip in.

In the meantime I have opened a JIRA ticket
<https://issues.apache.org/jira/browse/FLINK-26432> for this.
Also I invite you to review this PR
<https://github.com/apache/flink-kubernetes-operator/pull/26> , it is ready
with the base changes according to the discussion.

Once we have this merged it will be easier to parallelize work on the state
machine/observation/reconcile logic.

Gyula

On Tue, Mar 1, 2022 at 11:24 AM Gyula Fóra <gy...@gmail.com> wrote:

> Hi All!
>
> Based on your ideas and suggestions I have made some improvements to the
> refactor proposal PR:
> https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>
> I have now completely decoupled the observer from the reconciler and the
> JobManagerDeploymentStatus is recorded in the FlinkDeployment resource
> status. This greatly simplifies the flow and eliminates the need for any
> caching.
>
> I haven't touched the reconciliation logic here, and did not introduce a
> state machine to avoid making this change even bigger. The state machine
> changes could be built on top of this as a second stage of the refactoring
> work after this.
>
> I will work on improving the tests to capture and validate the flow better.
>
> Let me know what you think and if you have any ideas to further improve
> the high level flow!
>
> Thanks
> Gyula
>
> On Tue, Mar 1, 2022 at 9:07 AM Yang Wang <da...@gmail.com> wrote:
>
>> The state machine is going to replace the annoying if-else in the
>> reconciler.
>> It seems to have no conflicts with modular mechanism(aka observer,
>> reconciler, etc.).
>> And we could make them happen step by step.
>>
>>
>> Best,
>> Yang
>>
>>
>> Gyula Fóra <gy...@gmail.com> 于2022年3月1日周二 13:53写道：
>>
>> > And just one more thing I forgot to add:
>> >
>> > The observer does not have to be a dummy monolithic logic. Different job
>> > types will have different state machines with different observe
>> > requirements and different observer implementations. I think this is
>> > complexly normal. The observer can be aware of the expected state and
>> what
>> > it should observe on the happy path.
>> >
>> > Gyula
>> >
>> > On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gy...@gmail.com> wrote:
>> >
>> > > @Thomas:
>> > >
>> > > Thanks for the input! I completely agree with a well principled state
>> > > machine based approach in the operator would be the best.
>> > >
>> > > You are right that the code contains mostly if then else based logic
>> at
>> > > the moment as it evolved after initial prototype to cover more
>> scenarios.
>> > >
>> > > However I think the self contained observer - reconciler approach is
>> very
>> > > capable of implementing the suggested state machine based view in a
>> very
>> > > elegant way.
>> > >
>> > > Simply put, the observer is responsible for determining and recording
>> the
>> > > current state and any extra information that comes with that state.
>> > >
>> > > The reconciler then look at the current vs desired state and execute a
>> > > state transition.
>> > >
>> > > Gyula
>> > >
>> > > On Tue, 1 Mar 2022 at 02:16, Thomas Weise <th...@apache.org> wrote:
>> > >
>> > >> Thanks for bringing the discussion. It's a good time to revisit this
>> > >> area as the operator implementation becomes more complex.
>> > >>
>> > >> I think the most important part of the design are well defined states
>> > >> and transitions. Unless that can be reasoned about, there will be
>> > >> confusion. For example:
>> > >>
>> > >> 1) For job upgrade, it wasn't clear in which state reconciliation
>> > >> should happen and as a result the implementation ended up incomplete.
>> > >> 2) "JobStatusObserver" would attempt to list jobs before the REST API
>> > >> is ready. And it would be used in session mode, even though the
>> > >> session mode needs a different state machine.
>> > >>
>> > >> The implementation currently contains a lot of if-then-else logic,
>> > >> which is hard to follow and difficult to maintain. It will make it
>> > >> harder to introduce new states that would be necessary to implement
>> > >> more advanced upgrade strategies, amongst others.
>> > >>
>> > >> Did you consider a state machine centric approach? See [1] for
>> example.
>> > >>
>> > >> As Biao mentioned, observe -> reconcile may repeat and different
>> > >> states will require different checks. Once a job (in job mode) is
>> > >> running, there is no need to check the JM deployment etc. A
>> monolithic
>> > >> observer may not work so well for that. Rather, I think different
>> > >> states have different monitoring needs that inform transitions,
>> > >> including the actual state and changes made to the CR.
>> > >>
>> > >> It would also be good if the current state is directly reflected in
>> > >> the CR status so that it is easier to check where the deployment is
>> > >> at.
>> > >>
>> > >> Cheers,
>> > >> Thomas
>> > >>
>> > >>
>> > >> [1]
>> > >>
>> >
>> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>> > >>
>> > >> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > Hi All!
>> > >> >
>> > >> > Thank you for the feedback, I agree with what has been proposed to
>> > >> include
>> > >> > as much as possible in the actual resource status and make the
>> > >> reconciler
>> > >> > completely independent from the observer.
>> > >> >
>> > >> > @Biao Geng:
>> > >> > Validation could depend on the current status (depending on how we
>> > >> > implement the validation logic) so it might always be necessary
>> (and
>> > it
>> > >> is
>> > >> > also cheap).
>> > >> > What you are saying with multiple observe -> reconcile cycles ties
>> > back
>> > >> to
>> > >> > what Matyas said, that we should probably have an Observe loop
>> until
>> > we
>> > >> > have a stable state ready for reconciliation, then reconcile once.
>> > >> >
>> > >> > So probably validate -> observe until stable -> reconcile ->
>> observe
>> > >> until
>> > >> > stable
>> > >> >
>> > >> > Cheers,
>> > >> > Gyula
>> > >> >
>> > >> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > > Hi Gyula,
>> > >> > > Thanks for the discussion. It also makes senses to me on the
>> > >> separation of
>> > >> > > 3 components and Yang's proposal.
>> > >> > > Just 1 follow-up thought after checking your PR: in the
>> reconcile()
>> > >> > > method of controller, IIUC, the real flow could be
>> > >> > > `validate->observe->reconcile->validate->observe->reconcile...".
>> The
>> > >> > > validation phase seems to be required only when the creation of
>> the
>> > >> job
>> > >> > > cluster and the upgrade of config. For phases like waiting the JM
>> > from
>> > >> > > deploying to ready, it is not mandatory and thus the flow can
>> look
>> > >> like
>> > >> > > `validate->observe->reconcile->optional validate due to current
>> > >> > > state->observe->reconcile...`
>> > >> > >
>> > >> > > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
>> > >> > >
>> > >> > > > It is worth looking at the controller code in the spotify
>> operator
>> > >> too:
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >>
>> >
>> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
>> > >> > > >
>> > >> > > > It is looping in the 'observer phase' until it reaches a stable
>> > >> state,
>> > >> > > then
>> > >> > > > it performs the necessary changes.
>> > >> > > >
>> > >> > > > Based on this I also suggest keeping the logic in separate
>> > >> > > > modules(Validate->Observe->Reconcile). The three components
>> might
>> > >> not
>> > >> > > > even be enough as we add more and more complexity to the code.
>> > >> > > >
>> > >> > > > Cheers,
>> > >> > > > Matyas
>> > >> > > >
>> > >> > > >
>> > >> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com>
>> > >> wrote:
>> > >> > > >
>> > >> > > > > Hi, Gyula
>> > >> > > > >       Thanks for driving this discussion. I second Yang
>> Wang's
>> > >> idea
>> > >> > > that
>> > >> > > > > it's better to make the `validator`, `observer` and
>> `reconciler`
>> > >> > > > > self-contained. I also prefer to define the `Observer` as an
>> > >> interface
>> > >> > > > and
>> > >> > > > > we could define the statuses that `Observer` will expose. It
>> > acts
>> > >> like
>> > >> > > > the
>> > >> > > > > observer protocol between the `Observer` and `Reconciler`.
>> > >> > > > >
>> > >> > > > > Best,
>> > >> > > > > Aitozi.
>> > >> > > > >
>> > >> > > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
>> > >> > > > >
>> > >> > > > > > Thanks for posting the discussion here.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Having the components `validator` `observer` `reconciler`
>> > makes
>> > >> lots
>> > >> > > of
>> > >> > > > > > sense. And the "Validate -> Observe -> Reconcile"
>> > >> > > > > > flow seems natural to me.
>> > >> > > > > >
>> > >> > > > > > Regarding the implementation in the PR, instead of directly
>> > >> using the
>> > >> > > > > > observer in the reconciler, I lean to let the observer
>> > >> > > > > > exports the results to the status(e.g. jobmanager
>> deployment
>> > >> status,
>> > >> > > > rest
>> > >> > > > > > port readiness, flink jobs status, etc.) and
>> > >> > > > > > the reconciler reads it from the status. Then each
>> component
>> > is
>> > >> more
>> > >> > > > > > self-contained and the boundary will be clearer.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Best,
>> > >> > > > > > Yang
>> > >> > > > > >
>> > >> > > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
>> > >> > > > > >
>> > >> > > > > > > Hi All!
>> > >> > > > > > >
>> > >> > > > > > > I would like to start a discussion thread regarding the
>> > >> structure
>> > >> > > of
>> > >> > > > > > > the Kubernetes
>> > >> > > > > > > Operator
>> > >> > > > > > > <
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >>
>> >
>> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
>> > >> > > > > > > >
>> > >> > > > > > > controller
>> > >> > > > > > > flow. Based on some recent PR discussions we have no
>> clear
>> > >> > > consensus
>> > >> > > > on
>> > >> > > > > > the
>> > >> > > > > > > structure and the expectations which can potentially
>> lead to
>> > >> back
>> > >> > > and
>> > >> > > > > > forth
>> > >> > > > > > > changes and unnecessary complexity.
>> > >> > > > > > >
>> > >> > > > > > > *Background*
>> > >> > > > > > > In the initial prototype we had a very basic flow:
>> > >> > > > > > >  1. Observe flink job status
>> > >> > > > > > >  2. (if observation successful) reconcile changes
>> > >> > > > > > >  3. Reschedule reconcile with success/error
>> > >> > > > > > >
>> > >> > > > > > > This basic prototype flow could not cover all
>> requirements
>> > >> and did
>> > >> > > > not
>> > >> > > > > > > allow for things like waiting until Jobmanager
>> deployment is
>> > >> ready
>> > >> > > > etc.
>> > >> > > > > > >
>> > >> > > > > > > To solve these shortcomings, some changes were introduced
>> > >> recently
>> > >> > > > here
>> > >> > > > > > > <
>> > https://github.com/apache/flink-kubernetes-operator/pull/21
>> > >> >.
>> > >> > > While
>> > >> > > > > > this
>> > >> > > > > > > change introduced many improvements and safeguards it
>> also
>> > >> > > completely
>> > >> > > > > > > changed the original controller flow. Now the reconciler
>> is
>> > >> > > > responsible
>> > >> > > > > > for
>> > >> > > > > > > ensuring that it can actually reconcile by checking the
>> > >> deployment
>> > >> > > > and
>> > >> > > > > > > ports. The job status observation logic has also been
>> moved
>> > >> into
>> > >> > > the
>> > >> > > > > > actual
>> > >> > > > > > > reconcile logic.
>> > >> > > > > > >
>> > >> > > > > > >
>> > >> > > > > > > *Discussion Question*What controller flow would we like
>> to
>> > >> have? Do
>> > >> > > > we
>> > >> > > > > > want
>> > >> > > > > > > to separate the observer from the reconciler or keep them
>> > >> together?
>> > >> > > > > > >
>> > >> > > > > > > In my personal view, we should try to adopt a very simple
>> > >> flow to
>> > >> > > > make
>> > >> > > > > > the
>> > >> > > > > > > operator clean and modular. If possible I would like to
>> > >> restore the
>> > >> > > > > > > original flow with some modifications:
>> > >> > > > > > >
>> > >> > > > > > >  1. Validate deployment object
>> > >> > > > > > >  2. Observe deployment and flink job status -> Return
>> > >> comprehensive
>> > >> > > > > > status
>> > >> > > > > > > info
>> > >> > > > > > >  3. Reconcile deployment based on observed status and
>> > resource
>> > >> > > > changes
>> > >> > > > > > >  (Both 2/3 should be able to reschedule immediately if
>> > >> necessary)
>> > >> > > > > > >
>> > >> > > > > > > I think the Observer component should be able to describe
>> > the
>> > >> > > current
>> > >> > > > > > > status of the deployment objects and the flink job to the
>> > >> extent
>> > >> > > that
>> > >> > > > > the
>> > >> > > > > > > reconciler can work with that information alone. If we
>> do it
>> > >> this
>> > >> > > > way,
>> > >> > > > > we
>> > >> > > > > > > can also use the status information that the observer
>> > >> provides to
>> > >> > > > > produce
>> > >> > > > > > > other events and aid operations like shutdown which
>> depend
>> > on
>> > >> the
>> > >> > > > > current
>> > >> > > > > > > deployment status.
>> > >> > > > > > >
>> > >> > > > > > > I think this would satisfy our needs, but I might be
>> missing
>> > >> > > > something
>> > >> > > > > > that
>> > >> > > > > > > cannot be done if we structure the code this way.
>> > >> > > > > > >
>> > >> > > > > > > I have a PR open
>> > >> > > > > > > <
>> > >> > >
>> https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>> > >> > > > >
>> > >> > > > > > > which
>> > >> > > > > > > includes some of these proposed changes (as the optional
>> > >> second
>> > >> > > > commit)
>> > >> > > > > > so
>> > >> > > > > > > that you can easily compare with the current state of the
>> > >> operator.
>> > >> > > > > > >
>> > >> > > > > > > Please let us know what we think!
>> > >> > > > > > >
>> > >> > > > > > > Cheers,
>> > >> > > > > > > Gyula
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Posted by Gyula Fóra <gy...@gmail.com>.

Hi All!

Based on your ideas and suggestions I have made some improvements to the
refactor proposal PR:
https://github.com/apache/flink-kubernetes-operator/pull/26/commits

I have now completely decoupled the observer from the reconciler and the
JobManagerDeploymentStatus is recorded in the FlinkDeployment resource
status. This greatly simplifies the flow and eliminates the need for any
caching.

I haven't touched the reconciliation logic here, and did not introduce a
state machine to avoid making this change even bigger. The state machine
changes could be built on top of this as a second stage of the refactoring
work after this.

I will work on improving the tests to capture and validate the flow better.

Let me know what you think and if you have any ideas to further improve the
high level flow!

Thanks
Gyula

On Tue, Mar 1, 2022 at 9:07 AM Yang Wang <da...@gmail.com> wrote:

> The state machine is going to replace the annoying if-else in the
> reconciler.
> It seems to have no conflicts with modular mechanism(aka observer,
> reconciler, etc.).
> And we could make them happen step by step.
>
>
> Best,
> Yang
>
>
> Gyula Fóra <gy...@gmail.com> 于2022年3月1日周二 13:53写道：
>
> > And just one more thing I forgot to add:
> >
> > The observer does not have to be a dummy monolithic logic. Different job
> > types will have different state machines with different observe
> > requirements and different observer implementations. I think this is
> > complexly normal. The observer can be aware of the expected state and
> what
> > it should observe on the happy path.
> >
> > Gyula
> >
> > On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gy...@gmail.com> wrote:
> >
> > > @Thomas:
> > >
> > > Thanks for the input! I completely agree with a well principled state
> > > machine based approach in the operator would be the best.
> > >
> > > You are right that the code contains mostly if then else based logic at
> > > the moment as it evolved after initial prototype to cover more
> scenarios.
> > >
> > > However I think the self contained observer - reconciler approach is
> very
> > > capable of implementing the suggested state machine based view in a
> very
> > > elegant way.
> > >
> > > Simply put, the observer is responsible for determining and recording
> the
> > > current state and any extra information that comes with that state.
> > >
> > > The reconciler then look at the current vs desired state and execute a
> > > state transition.
> > >
> > > Gyula
> > >
> > > On Tue, 1 Mar 2022 at 02:16, Thomas Weise <th...@apache.org> wrote:
> > >
> > >> Thanks for bringing the discussion. It's a good time to revisit this
> > >> area as the operator implementation becomes more complex.
> > >>
> > >> I think the most important part of the design are well defined states
> > >> and transitions. Unless that can be reasoned about, there will be
> > >> confusion. For example:
> > >>
> > >> 1) For job upgrade, it wasn't clear in which state reconciliation
> > >> should happen and as a result the implementation ended up incomplete.
> > >> 2) "JobStatusObserver" would attempt to list jobs before the REST API
> > >> is ready. And it would be used in session mode, even though the
> > >> session mode needs a different state machine.
> > >>
> > >> The implementation currently contains a lot of if-then-else logic,
> > >> which is hard to follow and difficult to maintain. It will make it
> > >> harder to introduce new states that would be necessary to implement
> > >> more advanced upgrade strategies, amongst others.
> > >>
> > >> Did you consider a state machine centric approach? See [1] for
> example.
> > >>
> > >> As Biao mentioned, observe -> reconcile may repeat and different
> > >> states will require different checks. Once a job (in job mode) is
> > >> running, there is no need to check the JM deployment etc. A monolithic
> > >> observer may not work so well for that. Rather, I think different
> > >> states have different monitoring needs that inform transitions,
> > >> including the actual state and changes made to the CR.
> > >>
> > >> It would also be good if the current state is directly reflected in
> > >> the CR status so that it is easier to check where the deployment is
> > >> at.
> > >>
> > >> Cheers,
> > >> Thomas
> > >>
> > >>
> > >> [1]
> > >>
> >
> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
> > >>
> > >> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com>
> > wrote:
> > >> >
> > >> > Hi All!
> > >> >
> > >> > Thank you for the feedback, I agree with what has been proposed to
> > >> include
> > >> > as much as possible in the actual resource status and make the
> > >> reconciler
> > >> > completely independent from the observer.
> > >> >
> > >> > @Biao Geng:
> > >> > Validation could depend on the current status (depending on how we
> > >> > implement the validation logic) so it might always be necessary (and
> > it
> > >> is
> > >> > also cheap).
> > >> > What you are saying with multiple observe -> reconcile cycles ties
> > back
> > >> to
> > >> > what Matyas said, that we should probably have an Observe loop until
> > we
> > >> > have a stable state ready for reconciliation, then reconcile once.
> > >> >
> > >> > So probably validate -> observe until stable -> reconcile -> observe
> > >> until
> > >> > stable
> > >> >
> > >> > Cheers,
> > >> > Gyula
> > >> >
> > >> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi Gyula,
> > >> > > Thanks for the discussion. It also makes senses to me on the
> > >> separation of
> > >> > > 3 components and Yang's proposal.
> > >> > > Just 1 follow-up thought after checking your PR: in the
> reconcile()
> > >> > > method of controller, IIUC, the real flow could be
> > >> > > `validate->observe->reconcile->validate->observe->reconcile...".
> The
> > >> > > validation phase seems to be required only when the creation of
> the
> > >> job
> > >> > > cluster and the upgrade of config. For phases like waiting the JM
> > from
> > >> > > deploying to ready, it is not mandatory and thus the flow can look
> > >> like
> > >> > > `validate->observe->reconcile->optional validate due to current
> > >> > > state->observe->reconcile...`
> > >> > >
> > >> > > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
> > >> > >
> > >> > > > It is worth looking at the controller code in the spotify
> operator
> > >> too:
> > >> > > >
> > >> > > >
> > >> > >
> > >>
> >
> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
> > >> > > >
> > >> > > > It is looping in the 'observer phase' until it reaches a stable
> > >> state,
> > >> > > then
> > >> > > > it performs the necessary changes.
> > >> > > >
> > >> > > > Based on this I also suggest keeping the logic in separate
> > >> > > > modules(Validate->Observe->Reconcile). The three components
> might
> > >> not
> > >> > > > even be enough as we add more and more complexity to the code.
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Matyas
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi, Gyula
> > >> > > > >       Thanks for driving this discussion. I second Yang Wang's
> > >> idea
> > >> > > that
> > >> > > > > it's better to make the `validator`, `observer` and
> `reconciler`
> > >> > > > > self-contained. I also prefer to define the `Observer` as an
> > >> interface
> > >> > > > and
> > >> > > > > we could define the statuses that `Observer` will expose. It
> > acts
> > >> like
> > >> > > > the
> > >> > > > > observer protocol between the `Observer` and `Reconciler`.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Aitozi.
> > >> > > > >
> > >> > > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
> > >> > > > >
> > >> > > > > > Thanks for posting the discussion here.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Having the components `validator` `observer` `reconciler`
> > makes
> > >> lots
> > >> > > of
> > >> > > > > > sense. And the "Validate -> Observe -> Reconcile"
> > >> > > > > > flow seems natural to me.
> > >> > > > > >
> > >> > > > > > Regarding the implementation in the PR, instead of directly
> > >> using the
> > >> > > > > > observer in the reconciler, I lean to let the observer
> > >> > > > > > exports the results to the status(e.g. jobmanager deployment
> > >> status,
> > >> > > > rest
> > >> > > > > > port readiness, flink jobs status, etc.) and
> > >> > > > > > the reconciler reads it from the status. Then each component
> > is
> > >> more
> > >> > > > > > self-contained and the boundary will be clearer.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > Best,
> > >> > > > > > Yang
> > >> > > > > >
> > >> > > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
> > >> > > > > >
> > >> > > > > > > Hi All!
> > >> > > > > > >
> > >> > > > > > > I would like to start a discussion thread regarding the
> > >> structure
> > >> > > of
> > >> > > > > > > the Kubernetes
> > >> > > > > > > Operator
> > >> > > > > > > <
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> >
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> > >> > > > > > > >
> > >> > > > > > > controller
> > >> > > > > > > flow. Based on some recent PR discussions we have no clear
> > >> > > consensus
> > >> > > > on
> > >> > > > > > the
> > >> > > > > > > structure and the expectations which can potentially lead
> to
> > >> back
> > >> > > and
> > >> > > > > > forth
> > >> > > > > > > changes and unnecessary complexity.
> > >> > > > > > >
> > >> > > > > > > *Background*
> > >> > > > > > > In the initial prototype we had a very basic flow:
> > >> > > > > > >  1. Observe flink job status
> > >> > > > > > >  2. (if observation successful) reconcile changes
> > >> > > > > > >  3. Reschedule reconcile with success/error
> > >> > > > > > >
> > >> > > > > > > This basic prototype flow could not cover all requirements
> > >> and did
> > >> > > > not
> > >> > > > > > > allow for things like waiting until Jobmanager deployment
> is
> > >> ready
> > >> > > > etc.
> > >> > > > > > >
> > >> > > > > > > To solve these shortcomings, some changes were introduced
> > >> recently
> > >> > > > here
> > >> > > > > > > <
> > https://github.com/apache/flink-kubernetes-operator/pull/21
> > >> >.
> > >> > > While
> > >> > > > > > this
> > >> > > > > > > change introduced many improvements and safeguards it also
> > >> > > completely
> > >> > > > > > > changed the original controller flow. Now the reconciler
> is
> > >> > > > responsible
> > >> > > > > > for
> > >> > > > > > > ensuring that it can actually reconcile by checking the
> > >> deployment
> > >> > > > and
> > >> > > > > > > ports. The job status observation logic has also been
> moved
> > >> into
> > >> > > the
> > >> > > > > > actual
> > >> > > > > > > reconcile logic.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > *Discussion Question*What controller flow would we like to
> > >> have? Do
> > >> > > > we
> > >> > > > > > want
> > >> > > > > > > to separate the observer from the reconciler or keep them
> > >> together?
> > >> > > > > > >
> > >> > > > > > > In my personal view, we should try to adopt a very simple
> > >> flow to
> > >> > > > make
> > >> > > > > > the
> > >> > > > > > > operator clean and modular. If possible I would like to
> > >> restore the
> > >> > > > > > > original flow with some modifications:
> > >> > > > > > >
> > >> > > > > > >  1. Validate deployment object
> > >> > > > > > >  2. Observe deployment and flink job status -> Return
> > >> comprehensive
> > >> > > > > > status
> > >> > > > > > > info
> > >> > > > > > >  3. Reconcile deployment based on observed status and
> > resource
> > >> > > > changes
> > >> > > > > > >  (Both 2/3 should be able to reschedule immediately if
> > >> necessary)
> > >> > > > > > >
> > >> > > > > > > I think the Observer component should be able to describe
> > the
> > >> > > current
> > >> > > > > > > status of the deployment objects and the flink job to the
> > >> extent
> > >> > > that
> > >> > > > > the
> > >> > > > > > > reconciler can work with that information alone. If we do
> it
> > >> this
> > >> > > > way,
> > >> > > > > we
> > >> > > > > > > can also use the status information that the observer
> > >> provides to
> > >> > > > > produce
> > >> > > > > > > other events and aid operations like shutdown which depend
> > on
> > >> the
> > >> > > > > current
> > >> > > > > > > deployment status.
> > >> > > > > > >
> > >> > > > > > > I think this would satisfy our needs, but I might be
> missing
> > >> > > > something
> > >> > > > > > that
> > >> > > > > > > cannot be done if we structure the code this way.
> > >> > > > > > >
> > >> > > > > > > I have a PR open
> > >> > > > > > > <
> > >> > >
> https://github.com/apache/flink-kubernetes-operator/pull/26/commits
> > >> > > > >
> > >> > > > > > > which
> > >> > > > > > > includes some of these proposed changes (as the optional
> > >> second
> > >> > > > commit)
> > >> > > > > > so
> > >> > > > > > > that you can easily compare with the current state of the
> > >> operator.
> > >> > > > > > >
> > >> > > > > > > Please let us know what we think!
> > >> > > > > > >
> > >> > > > > > > Cheers,
> > >> > > > > > > Gyula
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> > >
> >
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Posted by Yang Wang <da...@gmail.com>.

The state machine is going to replace the annoying if-else in the
reconciler.
It seems to have no conflicts with modular mechanism(aka observer,
reconciler, etc.).
And we could make them happen step by step.


Best,
Yang


Gyula Fóra <gy...@gmail.com> 于2022年3月1日周二 13:53写道：

> And just one more thing I forgot to add:
>
> The observer does not have to be a dummy monolithic logic. Different job
> types will have different state machines with different observe
> requirements and different observer implementations. I think this is
> complexly normal. The observer can be aware of the expected state and what
> it should observe on the happy path.
>
> Gyula
>
> On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gy...@gmail.com> wrote:
>
> > @Thomas:
> >
> > Thanks for the input! I completely agree with a well principled state
> > machine based approach in the operator would be the best.
> >
> > You are right that the code contains mostly if then else based logic at
> > the moment as it evolved after initial prototype to cover more scenarios.
> >
> > However I think the self contained observer - reconciler approach is very
> > capable of implementing the suggested state machine based view in a very
> > elegant way.
> >
> > Simply put, the observer is responsible for determining and recording the
> > current state and any extra information that comes with that state.
> >
> > The reconciler then look at the current vs desired state and execute a
> > state transition.
> >
> > Gyula
> >
> > On Tue, 1 Mar 2022 at 02:16, Thomas Weise <th...@apache.org> wrote:
> >
> >> Thanks for bringing the discussion. It's a good time to revisit this
> >> area as the operator implementation becomes more complex.
> >>
> >> I think the most important part of the design are well defined states
> >> and transitions. Unless that can be reasoned about, there will be
> >> confusion. For example:
> >>
> >> 1) For job upgrade, it wasn't clear in which state reconciliation
> >> should happen and as a result the implementation ended up incomplete.
> >> 2) "JobStatusObserver" would attempt to list jobs before the REST API
> >> is ready. And it would be used in session mode, even though the
> >> session mode needs a different state machine.
> >>
> >> The implementation currently contains a lot of if-then-else logic,
> >> which is hard to follow and difficult to maintain. It will make it
> >> harder to introduce new states that would be necessary to implement
> >> more advanced upgrade strategies, amongst others.
> >>
> >> Did you consider a state machine centric approach? See [1] for example.
> >>
> >> As Biao mentioned, observe -> reconcile may repeat and different
> >> states will require different checks. Once a job (in job mode) is
> >> running, there is no need to check the JM deployment etc. A monolithic
> >> observer may not work so well for that. Rather, I think different
> >> states have different monitoring needs that inform transitions,
> >> including the actual state and changes made to the CR.
> >>
> >> It would also be good if the current state is directly reflected in
> >> the CR status so that it is easier to check where the deployment is
> >> at.
> >>
> >> Cheers,
> >> Thomas
> >>
> >>
> >> [1]
> >>
> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
> >>
> >> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com>
> wrote:
> >> >
> >> > Hi All!
> >> >
> >> > Thank you for the feedback, I agree with what has been proposed to
> >> include
> >> > as much as possible in the actual resource status and make the
> >> reconciler
> >> > completely independent from the observer.
> >> >
> >> > @Biao Geng:
> >> > Validation could depend on the current status (depending on how we
> >> > implement the validation logic) so it might always be necessary (and
> it
> >> is
> >> > also cheap).
> >> > What you are saying with multiple observe -> reconcile cycles ties
> back
> >> to
> >> > what Matyas said, that we should probably have an Observe loop until
> we
> >> > have a stable state ready for reconciliation, then reconcile once.
> >> >
> >> > So probably validate -> observe until stable -> reconcile -> observe
> >> until
> >> > stable
> >> >
> >> > Cheers,
> >> > Gyula
> >> >
> >> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com>
> wrote:
> >> >
> >> > > Hi Gyula,
> >> > > Thanks for the discussion. It also makes senses to me on the
> >> separation of
> >> > > 3 components and Yang's proposal.
> >> > > Just 1 follow-up thought after checking your PR: in the reconcile()
> >> > > method of controller, IIUC, the real flow could be
> >> > > `validate->observe->reconcile->validate->observe->reconcile...". The
> >> > > validation phase seems to be required only when the creation of the
> >> job
> >> > > cluster and the upgrade of config. For phases like waiting the JM
> from
> >> > > deploying to ready, it is not mandatory and thus the flow can look
> >> like
> >> > > `validate->observe->reconcile->optional validate due to current
> >> > > state->observe->reconcile...`
> >> > >
> >> > > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
> >> > >
> >> > > > It is worth looking at the controller code in the spotify operator
> >> too:
> >> > > >
> >> > > >
> >> > >
> >>
> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
> >> > > >
> >> > > > It is looping in the 'observer phase' until it reaches a stable
> >> state,
> >> > > then
> >> > > > it performs the necessary changes.
> >> > > >
> >> > > > Based on this I also suggest keeping the logic in separate
> >> > > > modules(Validate->Observe->Reconcile). The three components might
> >> not
> >> > > > even be enough as we add more and more complexity to the code.
> >> > > >
> >> > > > Cheers,
> >> > > > Matyas
> >> > > >
> >> > > >
> >> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hi, Gyula
> >> > > > >       Thanks for driving this discussion. I second Yang Wang's
> >> idea
> >> > > that
> >> > > > > it's better to make the `validator`, `observer` and `reconciler`
> >> > > > > self-contained. I also prefer to define the `Observer` as an
> >> interface
> >> > > > and
> >> > > > > we could define the statuses that `Observer` will expose. It
> acts
> >> like
> >> > > > the
> >> > > > > observer protocol between the `Observer` and `Reconciler`.
> >> > > > >
> >> > > > > Best,
> >> > > > > Aitozi.
> >> > > > >
> >> > > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
> >> > > > >
> >> > > > > > Thanks for posting the discussion here.
> >> > > > > >
> >> > > > > >
> >> > > > > > Having the components `validator` `observer` `reconciler`
> makes
> >> lots
> >> > > of
> >> > > > > > sense. And the "Validate -> Observe -> Reconcile"
> >> > > > > > flow seems natural to me.
> >> > > > > >
> >> > > > > > Regarding the implementation in the PR, instead of directly
> >> using the
> >> > > > > > observer in the reconciler, I lean to let the observer
> >> > > > > > exports the results to the status(e.g. jobmanager deployment
> >> status,
> >> > > > rest
> >> > > > > > port readiness, flink jobs status, etc.) and
> >> > > > > > the reconciler reads it from the status. Then each component
> is
> >> more
> >> > > > > > self-contained and the boundary will be clearer.
> >> > > > > >
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Yang
> >> > > > > >
> >> > > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
> >> > > > > >
> >> > > > > > > Hi All!
> >> > > > > > >
> >> > > > > > > I would like to start a discussion thread regarding the
> >> structure
> >> > > of
> >> > > > > > > the Kubernetes
> >> > > > > > > Operator
> >> > > > > > > <
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> >> > > > > > > >
> >> > > > > > > controller
> >> > > > > > > flow. Based on some recent PR discussions we have no clear
> >> > > consensus
> >> > > > on
> >> > > > > > the
> >> > > > > > > structure and the expectations which can potentially lead to
> >> back
> >> > > and
> >> > > > > > forth
> >> > > > > > > changes and unnecessary complexity.
> >> > > > > > >
> >> > > > > > > *Background*
> >> > > > > > > In the initial prototype we had a very basic flow:
> >> > > > > > >  1. Observe flink job status
> >> > > > > > >  2. (if observation successful) reconcile changes
> >> > > > > > >  3. Reschedule reconcile with success/error
> >> > > > > > >
> >> > > > > > > This basic prototype flow could not cover all requirements
> >> and did
> >> > > > not
> >> > > > > > > allow for things like waiting until Jobmanager deployment is
> >> ready
> >> > > > etc.
> >> > > > > > >
> >> > > > > > > To solve these shortcomings, some changes were introduced
> >> recently
> >> > > > here
> >> > > > > > > <
> https://github.com/apache/flink-kubernetes-operator/pull/21
> >> >.
> >> > > While
> >> > > > > > this
> >> > > > > > > change introduced many improvements and safeguards it also
> >> > > completely
> >> > > > > > > changed the original controller flow. Now the reconciler is
> >> > > > responsible
> >> > > > > > for
> >> > > > > > > ensuring that it can actually reconcile by checking the
> >> deployment
> >> > > > and
> >> > > > > > > ports. The job status observation logic has also been moved
> >> into
> >> > > the
> >> > > > > > actual
> >> > > > > > > reconcile logic.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > *Discussion Question*What controller flow would we like to
> >> have? Do
> >> > > > we
> >> > > > > > want
> >> > > > > > > to separate the observer from the reconciler or keep them
> >> together?
> >> > > > > > >
> >> > > > > > > In my personal view, we should try to adopt a very simple
> >> flow to
> >> > > > make
> >> > > > > > the
> >> > > > > > > operator clean and modular. If possible I would like to
> >> restore the
> >> > > > > > > original flow with some modifications:
> >> > > > > > >
> >> > > > > > >  1. Validate deployment object
> >> > > > > > >  2. Observe deployment and flink job status -> Return
> >> comprehensive
> >> > > > > > status
> >> > > > > > > info
> >> > > > > > >  3. Reconcile deployment based on observed status and
> resource
> >> > > > changes
> >> > > > > > >  (Both 2/3 should be able to reschedule immediately if
> >> necessary)
> >> > > > > > >
> >> > > > > > > I think the Observer component should be able to describe
> the
> >> > > current
> >> > > > > > > status of the deployment objects and the flink job to the
> >> extent
> >> > > that
> >> > > > > the
> >> > > > > > > reconciler can work with that information alone. If we do it
> >> this
> >> > > > way,
> >> > > > > we
> >> > > > > > > can also use the status information that the observer
> >> provides to
> >> > > > > produce
> >> > > > > > > other events and aid operations like shutdown which depend
> on
> >> the
> >> > > > > current
> >> > > > > > > deployment status.
> >> > > > > > >
> >> > > > > > > I think this would satisfy our needs, but I might be missing
> >> > > > something
> >> > > > > > that
> >> > > > > > > cannot be done if we structure the code this way.
> >> > > > > > >
> >> > > > > > > I have a PR open
> >> > > > > > > <
> >> > > https://github.com/apache/flink-kubernetes-operator/pull/26/commits
> >> > > > >
> >> > > > > > > which
> >> > > > > > > includes some of these proposed changes (as the optional
> >> second
> >> > > > commit)
> >> > > > > > so
> >> > > > > > > that you can easily compare with the current state of the
> >> operator.
> >> > > > > > >
> >> > > > > > > Please let us know what we think!
> >> > > > > > >
> >> > > > > > > Cheers,
> >> > > > > > > Gyula
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> >
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Posted by Gyula Fóra <gy...@gmail.com>.

And just one more thing I forgot to add:

The observer does not have to be a dummy monolithic logic. Different job
types will have different state machines with different observe
requirements and different observer implementations. I think this is
complexly normal. The observer can be aware of the expected state and what
it should observe on the happy path.

Gyula

On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gy...@gmail.com> wrote:

> @Thomas:
>
> Thanks for the input! I completely agree with a well principled state
> machine based approach in the operator would be the best.
>
> You are right that the code contains mostly if then else based logic at
> the moment as it evolved after initial prototype to cover more scenarios.
>
> However I think the self contained observer - reconciler approach is very
> capable of implementing the suggested state machine based view in a very
> elegant way.
>
> Simply put, the observer is responsible for determining and recording the
> current state and any extra information that comes with that state.
>
> The reconciler then look at the current vs desired state and execute a
> state transition.
>
> Gyula
>
> On Tue, 1 Mar 2022 at 02:16, Thomas Weise <th...@apache.org> wrote:
>
>> Thanks for bringing the discussion. It's a good time to revisit this
>> area as the operator implementation becomes more complex.
>>
>> I think the most important part of the design are well defined states
>> and transitions. Unless that can be reasoned about, there will be
>> confusion. For example:
>>
>> 1) For job upgrade, it wasn't clear in which state reconciliation
>> should happen and as a result the implementation ended up incomplete.
>> 2) "JobStatusObserver" would attempt to list jobs before the REST API
>> is ready. And it would be used in session mode, even though the
>> session mode needs a different state machine.
>>
>> The implementation currently contains a lot of if-then-else logic,
>> which is hard to follow and difficult to maintain. It will make it
>> harder to introduce new states that would be necessary to implement
>> more advanced upgrade strategies, amongst others.
>>
>> Did you consider a state machine centric approach? See [1] for example.
>>
>> As Biao mentioned, observe -> reconcile may repeat and different
>> states will require different checks. Once a job (in job mode) is
>> running, there is no need to check the JM deployment etc. A monolithic
>> observer may not work so well for that. Rather, I think different
>> states have different monitoring needs that inform transitions,
>> including the actual state and changes made to the CR.
>>
>> It would also be good if the current state is directly reflected in
>> the CR status so that it is easier to check where the deployment is
>> at.
>>
>> Cheers,
>> Thomas
>>
>>
>> [1]
>> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>>
>> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com> wrote:
>> >
>> > Hi All!
>> >
>> > Thank you for the feedback, I agree with what has been proposed to
>> include
>> > as much as possible in the actual resource status and make the
>> reconciler
>> > completely independent from the observer.
>> >
>> > @Biao Geng:
>> > Validation could depend on the current status (depending on how we
>> > implement the validation logic) so it might always be necessary (and it
>> is
>> > also cheap).
>> > What you are saying with multiple observe -> reconcile cycles ties back
>> to
>> > what Matyas said, that we should probably have an Observe loop until we
>> > have a stable state ready for reconciliation, then reconcile once.
>> >
>> > So probably validate -> observe until stable -> reconcile -> observe
>> until
>> > stable
>> >
>> > Cheers,
>> > Gyula
>> >
>> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com> wrote:
>> >
>> > > Hi Gyula,
>> > > Thanks for the discussion. It also makes senses to me on the
>> separation of
>> > > 3 components and Yang's proposal.
>> > > Just 1 follow-up thought after checking your PR: in the reconcile()
>> > > method of controller, IIUC, the real flow could be
>> > > `validate->observe->reconcile->validate->observe->reconcile...". The
>> > > validation phase seems to be required only when the creation of the
>> job
>> > > cluster and the upgrade of config. For phases like waiting the JM from
>> > > deploying to ready, it is not mandatory and thus the flow can look
>> like
>> > > `validate->observe->reconcile->optional validate due to current
>> > > state->observe->reconcile...`
>> > >
>> > > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
>> > >
>> > > > It is worth looking at the controller code in the spotify operator
>> too:
>> > > >
>> > > >
>> > >
>> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
>> > > >
>> > > > It is looping in the 'observer phase' until it reaches a stable
>> state,
>> > > then
>> > > > it performs the necessary changes.
>> > > >
>> > > > Based on this I also suggest keeping the logic in separate
>> > > > modules(Validate->Observe->Reconcile). The three components might
>> not
>> > > > even be enough as we add more and more complexity to the code.
>> > > >
>> > > > Cheers,
>> > > > Matyas
>> > > >
>> > > >
>> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi, Gyula
>> > > > >       Thanks for driving this discussion. I second Yang Wang's
>> idea
>> > > that
>> > > > > it's better to make the `validator`, `observer` and `reconciler`
>> > > > > self-contained. I also prefer to define the `Observer` as an
>> interface
>> > > > and
>> > > > > we could define the statuses that `Observer` will expose. It acts
>> like
>> > > > the
>> > > > > observer protocol between the `Observer` and `Reconciler`.
>> > > > >
>> > > > > Best,
>> > > > > Aitozi.
>> > > > >
>> > > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
>> > > > >
>> > > > > > Thanks for posting the discussion here.
>> > > > > >
>> > > > > >
>> > > > > > Having the components `validator` `observer` `reconciler` makes
>> lots
>> > > of
>> > > > > > sense. And the "Validate -> Observe -> Reconcile"
>> > > > > > flow seems natural to me.
>> > > > > >
>> > > > > > Regarding the implementation in the PR, instead of directly
>> using the
>> > > > > > observer in the reconciler, I lean to let the observer
>> > > > > > exports the results to the status(e.g. jobmanager deployment
>> status,
>> > > > rest
>> > > > > > port readiness, flink jobs status, etc.) and
>> > > > > > the reconciler reads it from the status. Then each component is
>> more
>> > > > > > self-contained and the boundary will be clearer.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Yang
>> > > > > >
>> > > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
>> > > > > >
>> > > > > > > Hi All!
>> > > > > > >
>> > > > > > > I would like to start a discussion thread regarding the
>> structure
>> > > of
>> > > > > > > the Kubernetes
>> > > > > > > Operator
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
>> > > > > > > >
>> > > > > > > controller
>> > > > > > > flow. Based on some recent PR discussions we have no clear
>> > > consensus
>> > > > on
>> > > > > > the
>> > > > > > > structure and the expectations which can potentially lead to
>> back
>> > > and
>> > > > > > forth
>> > > > > > > changes and unnecessary complexity.
>> > > > > > >
>> > > > > > > *Background*
>> > > > > > > In the initial prototype we had a very basic flow:
>> > > > > > >  1. Observe flink job status
>> > > > > > >  2. (if observation successful) reconcile changes
>> > > > > > >  3. Reschedule reconcile with success/error
>> > > > > > >
>> > > > > > > This basic prototype flow could not cover all requirements
>> and did
>> > > > not
>> > > > > > > allow for things like waiting until Jobmanager deployment is
>> ready
>> > > > etc.
>> > > > > > >
>> > > > > > > To solve these shortcomings, some changes were introduced
>> recently
>> > > > here
>> > > > > > > <https://github.com/apache/flink-kubernetes-operator/pull/21
>> >.
>> > > While
>> > > > > > this
>> > > > > > > change introduced many improvements and safeguards it also
>> > > completely
>> > > > > > > changed the original controller flow. Now the reconciler is
>> > > > responsible
>> > > > > > for
>> > > > > > > ensuring that it can actually reconcile by checking the
>> deployment
>> > > > and
>> > > > > > > ports. The job status observation logic has also been moved
>> into
>> > > the
>> > > > > > actual
>> > > > > > > reconcile logic.
>> > > > > > >
>> > > > > > >
>> > > > > > > *Discussion Question*What controller flow would we like to
>> have? Do
>> > > > we
>> > > > > > want
>> > > > > > > to separate the observer from the reconciler or keep them
>> together?
>> > > > > > >
>> > > > > > > In my personal view, we should try to adopt a very simple
>> flow to
>> > > > make
>> > > > > > the
>> > > > > > > operator clean and modular. If possible I would like to
>> restore the
>> > > > > > > original flow with some modifications:
>> > > > > > >
>> > > > > > >  1. Validate deployment object
>> > > > > > >  2. Observe deployment and flink job status -> Return
>> comprehensive
>> > > > > > status
>> > > > > > > info
>> > > > > > >  3. Reconcile deployment based on observed status and resource
>> > > > changes
>> > > > > > >  (Both 2/3 should be able to reschedule immediately if
>> necessary)
>> > > > > > >
>> > > > > > > I think the Observer component should be able to describe the
>> > > current
>> > > > > > > status of the deployment objects and the flink job to the
>> extent
>> > > that
>> > > > > the
>> > > > > > > reconciler can work with that information alone. If we do it
>> this
>> > > > way,
>> > > > > we
>> > > > > > > can also use the status information that the observer
>> provides to
>> > > > > produce
>> > > > > > > other events and aid operations like shutdown which depend on
>> the
>> > > > > current
>> > > > > > > deployment status.
>> > > > > > >
>> > > > > > > I think this would satisfy our needs, but I might be missing
>> > > > something
>> > > > > > that
>> > > > > > > cannot be done if we structure the code this way.
>> > > > > > >
>> > > > > > > I have a PR open
>> > > > > > > <
>> > > https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>> > > > >
>> > > > > > > which
>> > > > > > > includes some of these proposed changes (as the optional
>> second
>> > > > commit)
>> > > > > > so
>> > > > > > > that you can easily compare with the current state of the
>> operator.
>> > > > > > >
>> > > > > > > Please let us know what we think!
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Gyula
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Posted by Gyula Fóra <gy...@gmail.com>.

@Thomas:

Thanks for the input! I completely agree with a well principled state
machine based approach in the operator would be the best.

You are right that the code contains mostly if then else based logic at the
moment as it evolved after initial prototype to cover more scenarios.

However I think the self contained observer - reconciler approach is very
capable of implementing the suggested state machine based view in a very
elegant way.

Simply put, the observer is responsible for determining and recording the
current state and any extra information that comes with that state.

The reconciler then look at the current vs desired state and execute a
state transition.

Gyula

On Tue, 1 Mar 2022 at 02:16, Thomas Weise <th...@apache.org> wrote:

> Thanks for bringing the discussion. It's a good time to revisit this
> area as the operator implementation becomes more complex.
>
> I think the most important part of the design are well defined states
> and transitions. Unless that can be reasoned about, there will be
> confusion. For example:
>
> 1) For job upgrade, it wasn't clear in which state reconciliation
> should happen and as a result the implementation ended up incomplete.
> 2) "JobStatusObserver" would attempt to list jobs before the REST API
> is ready. And it would be used in session mode, even though the
> session mode needs a different state machine.
>
> The implementation currently contains a lot of if-then-else logic,
> which is hard to follow and difficult to maintain. It will make it
> harder to introduce new states that would be necessary to implement
> more advanced upgrade strategies, amongst others.
>
> Did you consider a state machine centric approach? See [1] for example.
>
> As Biao mentioned, observe -> reconcile may repeat and different
> states will require different checks. Once a job (in job mode) is
> running, there is no need to check the JM deployment etc. A monolithic
> observer may not work so well for that. Rather, I think different
> states have different monitoring needs that inform transitions,
> including the actual state and changes made to the CR.
>
> It would also be good if the current state is directly reflected in
> the CR status so that it is easier to check where the deployment is
> at.
>
> Cheers,
> Thomas
>
>
> [1]
> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>
> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gy...@gmail.com> wrote:
> >
> > Hi All!
> >
> > Thank you for the feedback, I agree with what has been proposed to
> include
> > as much as possible in the actual resource status and make the reconciler
> > completely independent from the observer.
> >
> > @Biao Geng:
> > Validation could depend on the current status (depending on how we
> > implement the validation logic) so it might always be necessary (and it
> is
> > also cheap).
> > What you are saying with multiple observe -> reconcile cycles ties back
> to
> > what Matyas said, that we should probably have an Observe loop until we
> > have a stable state ready for reconciliation, then reconcile once.
> >
> > So probably validate -> observe until stable -> reconcile -> observe
> until
> > stable
> >
> > Cheers,
> > Gyula
> >
> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <bi...@gmail.com> wrote:
> >
> > > Hi Gyula,
> > > Thanks for the discussion. It also makes senses to me on the
> separation of
> > > 3 components and Yang's proposal.
> > > Just 1 follow-up thought after checking your PR: in the reconcile()
> > > method of controller, IIUC, the real flow could be
> > > `validate->observe->reconcile->validate->observe->reconcile...". The
> > > validation phase seems to be required only when the creation of the job
> > > cluster and the upgrade of config. For phases like waiting the JM from
> > > deploying to ready, it is not mandatory and thus the flow can look like
> > > `validate->observe->reconcile->optional validate due to current
> > > state->observe->reconcile...`
> > >
> > > Őrhidi Mátyás <ma...@gmail.com> 于2022年2月28日周一 21:26写道：
> > >
> > > > It is worth looking at the controller code in the spotify operator
> too:
> > > >
> > > >
> > >
> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
> > > >
> > > > It is looping in the 'observer phase' until it reaches a stable
> state,
> > > then
> > > > it performs the necessary changes.
> > > >
> > > > Based on this I also suggest keeping the logic in separate
> > > > modules(Validate->Observe->Reconcile). The three components might not
> > > > even be enough as we add more and more complexity to the code.
> > > >
> > > > Cheers,
> > > > Matyas
> > > >
> > > >
> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gj...@gmail.com> wrote:
> > > >
> > > > > Hi, Gyula
> > > > >       Thanks for driving this discussion. I second Yang Wang's idea
> > > that
> > > > > it's better to make the `validator`, `observer` and `reconciler`
> > > > > self-contained. I also prefer to define the `Observer` as an
> interface
> > > > and
> > > > > we could define the statuses that `Observer` will expose. It acts
> like
> > > > the
> > > > > observer protocol between the `Observer` and `Reconciler`.
> > > > >
> > > > > Best,
> > > > > Aitozi.
> > > > >
> > > > > Yang Wang <da...@gmail.com> 于2022年2月28日周一 16:28写道：
> > > > >
> > > > > > Thanks for posting the discussion here.
> > > > > >
> > > > > >
> > > > > > Having the components `validator` `observer` `reconciler` makes
> lots
> > > of
> > > > > > sense. And the "Validate -> Observe -> Reconcile"
> > > > > > flow seems natural to me.
> > > > > >
> > > > > > Regarding the implementation in the PR, instead of directly
> using the
> > > > > > observer in the reconciler, I lean to let the observer
> > > > > > exports the results to the status(e.g. jobmanager deployment
> status,
> > > > rest
> > > > > > port readiness, flink jobs status, etc.) and
> > > > > > the reconciler reads it from the status. Then each component is
> more
> > > > > > self-contained and the boundary will be clearer.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Gyula Fóra <gy...@apache.org> 于2022年2月28日周一 16:01写道：
> > > > > >
> > > > > > > Hi All!
> > > > > > >
> > > > > > > I would like to start a discussion thread regarding the
> structure
> > > of
> > > > > > > the Kubernetes
> > > > > > > Operator
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> > > > > > > >
> > > > > > > controller
> > > > > > > flow. Based on some recent PR discussions we have no clear
> > > consensus
> > > > on
> > > > > > the
> > > > > > > structure and the expectations which can potentially lead to
> back
> > > and
> > > > > > forth
> > > > > > > changes and unnecessary complexity.
> > > > > > >
> > > > > > > *Background*
> > > > > > > In the initial prototype we had a very basic flow:
> > > > > > >  1. Observe flink job status
> > > > > > >  2. (if observation successful) reconcile changes
> > > > > > >  3. Reschedule reconcile with success/error
> > > > > > >
> > > > > > > This basic prototype flow could not cover all requirements and
> did
> > > > not
> > > > > > > allow for things like waiting until Jobmanager deployment is
> ready
> > > > etc.
> > > > > > >
> > > > > > > To solve these shortcomings, some changes were introduced
> recently
> > > > here
> > > > > > > <https://github.com/apache/flink-kubernetes-operator/pull/21>.
> > > While
> > > > > > this
> > > > > > > change introduced many improvements and safeguards it also
> > > completely
> > > > > > > changed the original controller flow. Now the reconciler is
> > > > responsible
> > > > > > for
> > > > > > > ensuring that it can actually reconcile by checking the
> deployment
> > > > and
> > > > > > > ports. The job status observation logic has also been moved
> into
> > > the
> > > > > > actual
> > > > > > > reconcile logic.
> > > > > > >
> > > > > > >
> > > > > > > *Discussion Question*What controller flow would we like to
> have? Do
> > > > we
> > > > > > want
> > > > > > > to separate the observer from the reconciler or keep them
> together?
> > > > > > >
> > > > > > > In my personal view, we should try to adopt a very simple flow
> to
> > > > make
> > > > > > the
> > > > > > > operator clean and modular. If possible I would like to
> restore the
> > > > > > > original flow with some modifications:
> > > > > > >
> > > > > > >  1. Validate deployment object
> > > > > > >  2. Observe deployment and flink job status -> Return
> comprehensive
> > > > > > status
> > > > > > > info
> > > > > > >  3. Reconcile deployment based on observed status and resource
> > > > changes
> > > > > > >  (Both 2/3 should be able to reschedule immediately if
> necessary)
> > > > > > >
> > > > > > > I think the Observer component should be able to describe the
> > > current
> > > > > > > status of the deployment objects and the flink job to the
> extent
> > > that
> > > > > the
> > > > > > > reconciler can work with that information alone. If we do it
> this
> > > > way,
> > > > > we
> > > > > > > can also use the status information that the observer provides
> to
> > > > > produce
> > > > > > > other events and aid operations like shutdown which depend on
> the
> > > > > current
> > > > > > > deployment status.
> > > > > > >
> > > > > > > I think this would satisfy our needs, but I might be missing
> > > > something
> > > > > > that
> > > > > > > cannot be done if we structure the code this way.
> > > > > > >
> > > > > > > I have a PR open
> > > > > > > <
> > > https://github.com/apache/flink-kubernetes-operator/pull/26/commits
> > > > >
> > > > > > > which
> > > > > > > includes some of these proposed changes (as the optional second
> > > > commit)
> > > > > > so
> > > > > > > that you can easily compare with the current state of the
> operator.
> > > > > > >
> > > > > > > Please let us know what we think!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Gyula
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>