You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Aitozi <gj...@gmail.com> on 2022/03/16 07:41:27 UTC

[DISCUSS] Support the session job management in kubernetes operator

Hi Guys:

    I would like to open a discussion for support session job management in
kubernetes operator. It’s intended to enhance the flink-kubernetes-operator
to manage the session job with k8s tooling. I have drafted the design
doc[1]. Please refer to it and give me some feedback .


[1]
https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#

Best,

Aitozi.

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

> We should check there's no running Flink job before deleting a session
FlinkDeployment

If we have to prevent stopping the session cluster before all session jobs
are down already. I think we should avoid deleting the session deployment
by returning DeleteControl#noFinalizerRemoval()[1] in cleanup, And then
schedule the reconcile to check and delete the session cluster until there
is no session job instance.


[1]:
https://github.com/java-operator-sdk/java-operator-sdk/blob/b91221bb54af19761a617bf18eef381e8ceb3b4c/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/Reconciler.java#L14




Yang Wang <da...@gmail.com> 于2022年3月22日周二 18:48写道：

> The relationship between the session deployment and the Flink jobs looks
> good to me except for the session deployment deletion.
>
> I strongly suggest not to set the ownerference of the FlinkSessionJob to
> the session FlinkDeployment.
> Otherwise, it will be a disaster if the session FlinkDeployment is deleted
> accidentally and there are many running jobs.
> We should check there's no running Flink job before deleting a session
> FlinkDeployment. And this will force the users to have a double
> confirmation.
>
> Best,
> Yang
>
>
> Aitozi <gj...@gmail.com> 于2022年3月22日周二 17:49写道：
>
> > Hi Thomas:
> >
> >     Thanks for your valuable question. Let’s make the relationship
> between
> > the session deployment and the jobs more clear.
> >
> > IMO, the session deployment and jobs interact in these situations:
> >
> > - Create the session job. Then FlinkSessionJobController will wait for
> the
> > session cluster ready then submit the job. The look up key is namespace
> and
> > clusterId.
> >
> > - Delete the session job. Then it will cancel the current session job.
> >
> > - Delete the session deployment. It will have to delete the session job
> > first, we could set the ownerference of the FlinkSessionJob to let the
> > Kubernetes trigger the cleanup session jobs before removing the session
> > deployment.
> >
> > - Upgrade the session deployment. It will be a critical part, because it
> > will affect all the session jobs. We should suspend the job first and
> then
> > upgrade the session cluster. So I tend to validate that all the jobs are
> > suspended and then perform the session cluster upgrade. After upgrade
> then
> > change the session jobs to running manually.
> >
> > What do you think about this? If there is no objection, I will clarify it
> > in the FLIP doc.
> >
> >
> > Besides, sorry for the rough vote and discussion process. It's my first
> > time driving this, I will keep that in mind next time :)
> > Best,
> > Aitozi.
> >
> > Yang Wang <da...@gmail.com> 于2022年3月22日周二 10:11写道：
> >
> > > I think the session cluster could not be deleted unless all the running
> > > jobs have finished or cancelled. I agree this should be clarified in
> the
> > > FLIP.
> > >
> > > Best,
> > > Yang
> > >
> > > Thomas Weise <th...@apache.org> 于2022年3月22日周二 09:26写道：
> > >
> > > > Hi Aitozi,
> > > >
> > > > Thanks for the proposal. Can you please clarify in the FLIP the
> > > > relationship between the session deployment and the jobs that depend
> on
> > > it?
> > > > Will, for example, the operator ensure that the individual jobs are
> > > > deleted when the underlying cluster is deleted?
> > > >
> > > > Side note: When the discussion thread started 5 days ago and a FLIP
> > vote
> > > > was started 2 days later and there is also a weekend included, then
> > this
> > > is
> > > > probably on the short side for broader feedback.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > >
> > > > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang <da...@gmail.com>
> > wrote:
> > > >
> > > > > Great work. Since we are introducing a new public API, it deserves
> a
> > > > FLIP.
> > > > > And the FLIP will help the later contributors catch up soon.
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：
> > > > >
> > > > > > Thank Aitozi, a FLIP might be an overkill at this point but no
> harm
> > > in
> > > > > > voting on it anyways :)
> > > > > >
> > > > > > Looks good!
> > > > > >
> > > > > > Gyula
> > > > > >
> > > > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi Guys:
> > > > > > >
> > > > > > >     FYI, I have integrated your comments and drawn the
> > > FLIP-215[1], I
> > > > > > will
> > > > > > > create another thread to vote for it.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Aitozi.
> > > > > > >
> > > > > > >
> > > > > > > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> > > > > > >
> > > > > > > > Hi Biao Geng:
> > > > > > > >
> > > > > > > >    Thanks for your feedback, I'm +1 to go with option#2.
> It's a
> > > > good
> > > > > > > > point that
> > > > > > > >
> > > > > > > > we should improve the error message debugging for the session
> > > job,
> > > > I
> > > > > > > > think
> > > > > > > >
> > > > > > > > it can be a follow up work as an improvement after we support
> > the
> > > > > > session
> > > > > > > > job operation.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Aitozi.
> > > > > > > >
> > > > > > > >
> > > > > > > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > > > > > > >
> > > > > > > >> Thanks Aitozi for the work!
> > > > > > > >>
> > > > > > > >> I lean to option#2 of using JarRunHeaders with uber job jar
> as
> > > > well.
> > > > > > As
> > > > > > > >> Yang said, the user defined dependencies may be better
> > supported
> > > > in
> > > > > > > >> upstream flink.
> > > > > > > >> A follow-up thought: I think we should care the  potential
> > > > influence
> > > > > > on
> > > > > > > >> user experiences: as the job graph is generated in JM, when
> > the
> > > > > > > generation
> > > > > > > >> fails due to some issues in the main() method, we should do
> > some
> > > > > work
> > > > > > on
> > > > > > > >> showing such error messages in this proposal or the later
> k8s
> > > > > operator
> > > > > > > >> implementation.  Reason for this question is that if users
> > > submit
> > > > > many
> > > > > > > jobs
> > > > > > > >> to one same session cluster, it may be not easy for them to
> > find
> > > > > > > relevant
> > > > > > > >> error logs about main() method of a specific job. The
> > > FLINK-25715
> > > > > > could
> > > > > > > >> help us later.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Biao Geng
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> 发件人: Aitozi <gj...@gmail.com>
> > > > > > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > > > > > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > > > > > > >> 主题: Re: [DISCUSS] Support the session job management in
> > > kubernetes
> > > > > > > >> operator
> > > > > > > >> Hi Yang Wang
> > > > > > > >>     Thanks for your feedback, Provide the local and http
> > > > > > implementation
> > > > > > > >> for
> > > > > > > >> the first version makes sense to me.
> > > > > > > >> +1 for it.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Aitozi
> > > > > > > >>
> > > > > > > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > > > > > > >>
> > > > > > > >> > # How to download the user jars
> > > > > > > >> > I agree with Gyula that it will be a burden if we bundle
> the
> > > > flink
> > > > > > > >> > filesystem dependencies in the operator image.
> > > > > > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > > > > > >> > flink-kubernetes-operator. By default, we provide the
> local
> > > and
> > > > > http
> > > > > > > >> > implementation,
> > > > > > > >> > which means we could get the user jars from local files or
> > > HTTP
> > > > > > URLs.
> > > > > > > >> Flink
> > > > > > > >> > filesystem support could be done as a follow-up based on
> the
> > > > > > feedback.
> > > > > > > >> >
> > > > > > > >> > If the user wants to use the local implementation, they
> need
> > > to
> > > > > > mount
> > > > > > > a
> > > > > > > >> > PV(aka persist volume) to the operator first and then put
> > > their
> > > > > jars
> > > > > > > >> into
> > > > > > > >> > the PV.
> > > > > > > >> >
> > > > > > > >> > # How to talk to session JobManager to submit the job
> > > > > > > >> > After more consideration, I also prefer the second
> approach,
> > > via
> > > > > > REST
> > > > > > > >> API
> > > > > > > >> > /jars/:jarid/run. If we have strong requirements to
> support
> > > > > > > dependencies
> > > > > > > >> > jars and
> > > > > > > >> > artifacts, we could try to support this in the upstream
> > > project.
> > > > > > > >> >
> > > > > > > >> > Best,
> > > > > > > >> > Yang
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > > > > > > >> >
> > > > > > > >> > > Hi Gyula
> > > > > > > >> > >     Thanks for your quick response. Regarding the
> > different
> > > > > > > >> filesystems
> > > > > > > >> > > dependency,
> > > > > > > >> > > I think we can make it optional and pluggable, and let
> it
> > > > choose
> > > > > > by
> > > > > > > >> user
> > > > > > > >> > > when building
> > > > > > > >> > > their operator image. Users can build their image from
> the
> > > > base
> > > > > > > >> operator
> > > > > > > >> > > image and
> > > > > > > >> > > add filesystem dependency they want to use to it. BTW,
> we
> > > can
> > > > > > > support
> > > > > > > >> the
> > > > > > > >> > > http URI
> > > > > > > >> > > by default.
> > > > > > > >> > >
> > > > > > > >> > > Thanks,
> > > > > > > >> > > Aitozi.
> > > > > > > >> > >
> > > > > > > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三
> 15:53写道：
> > > > > > > >> > >
> > > > > > > >> > > > Thank you Aitozi!
> > > > > > > >> > > >
> > > > > > > >> > > > I think this will be a very nice (and simple) addition
> > to
> > > > > enable
> > > > > > > >> these
> > > > > > > >> > > > use-cases.
> > > > > > > >> > > >
> > > > > > > >> > > > I have 2 comments regarding the proposal:
> > > > > > > >> > > >
> > > > > > > >> > > > 1. I think if we want to support different filesystems
> > to
> > > > > > download
> > > > > > > >> jars
> > > > > > > >> > > > from, we probably need some clever ways to add
> external
> > > > > operator
> > > > > > > >> > > > dependencies (jars, configs).
> > > > > > > >> > > > I would prefer not to bundle them into the base
> operator
> > > > > image.
> > > > > > > >> > > >
> > > > > > > >> > > > 2. I think we should avoid creating the jobgraphs on
> the
> > > > > > operator
> > > > > > > >> side
> > > > > > > >> > > and
> > > > > > > >> > > > use the jar upload/run rest api instead as you
> > suggested.
> > > > This
> > > > > > > will
> > > > > > > >> > avoid
> > > > > > > >> > > > flink version and dependency conflicts.
> > > > > > > >> > > >
> > > > > > > >> > > > Cheers,
> > > > > > > >> > > > Gyula
> > > > > > > >> > > >
> > > > > > > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <
> > > > gjying1314@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi Guys:
> > > > > > > >> > > > >
> > > > > > > >> > > > >     I would like to open a discussion for support
> > > session
> > > > > job
> > > > > > > >> > > management
> > > > > > > >> > > > in
> > > > > > > >> > > > > kubernetes operator. It’s intended to enhance the
> > > > > > > >> > > > flink-kubernetes-operator
> > > > > > > >> > > > > to manage the session job with k8s tooling. I have
> > > drafted
> > > > > the
> > > > > > > >> design
> > > > > > > >> > > > > doc[1]. Please refer to it and give me some
> feedback .
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > [1]
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > > > > > >> <
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > > > > > > >> >
> > > > > > > >> > > > >
> > > > > > > >> > > > > Best,
> > > > > > > >> > > > >
> > > > > > > >> > > > > Aitozi.
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Yang Wang <da...@gmail.com>.

The relationship between the session deployment and the Flink jobs looks
good to me except for the session deployment deletion.

I strongly suggest not to set the ownerference of the FlinkSessionJob to
the session FlinkDeployment.
Otherwise, it will be a disaster if the session FlinkDeployment is deleted
accidentally and there are many running jobs.
We should check there's no running Flink job before deleting a session
FlinkDeployment. And this will force the users to have a double
confirmation.

Best,
Yang


Aitozi <gj...@gmail.com> 于2022年3月22日周二 17:49写道：

> Hi Thomas:
>
>     Thanks for your valuable question. Let’s make the relationship between
> the session deployment and the jobs more clear.
>
> IMO, the session deployment and jobs interact in these situations:
>
> - Create the session job. Then FlinkSessionJobController will wait for the
> session cluster ready then submit the job. The look up key is namespace and
> clusterId.
>
> - Delete the session job. Then it will cancel the current session job.
>
> - Delete the session deployment. It will have to delete the session job
> first, we could set the ownerference of the FlinkSessionJob to let the
> Kubernetes trigger the cleanup session jobs before removing the session
> deployment.
>
> - Upgrade the session deployment. It will be a critical part, because it
> will affect all the session jobs. We should suspend the job first and then
> upgrade the session cluster. So I tend to validate that all the jobs are
> suspended and then perform the session cluster upgrade. After upgrade then
> change the session jobs to running manually.
>
> What do you think about this? If there is no objection, I will clarify it
> in the FLIP doc.
>
>
> Besides, sorry for the rough vote and discussion process. It's my first
> time driving this, I will keep that in mind next time :)
> Best,
> Aitozi.
>
> Yang Wang <da...@gmail.com> 于2022年3月22日周二 10:11写道：
>
> > I think the session cluster could not be deleted unless all the running
> > jobs have finished or cancelled. I agree this should be clarified in the
> > FLIP.
> >
> > Best,
> > Yang
> >
> > Thomas Weise <th...@apache.org> 于2022年3月22日周二 09:26写道：
> >
> > > Hi Aitozi,
> > >
> > > Thanks for the proposal. Can you please clarify in the FLIP the
> > > relationship between the session deployment and the jobs that depend on
> > it?
> > > Will, for example, the operator ensure that the individual jobs are
> > > deleted when the underlying cluster is deleted?
> > >
> > > Side note: When the discussion thread started 5 days ago and a FLIP
> vote
> > > was started 2 days later and there is also a weekend included, then
> this
> > is
> > > probably on the short side for broader feedback.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang <da...@gmail.com>
> wrote:
> > >
> > > > Great work. Since we are introducing a new public API, it deserves a
> > > FLIP.
> > > > And the FLIP will help the later contributors catch up soon.
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：
> > > >
> > > > > Thank Aitozi, a FLIP might be an overkill at this point but no harm
> > in
> > > > > voting on it anyways :)
> > > > >
> > > > > Looks good!
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Guys:
> > > > > >
> > > > > >     FYI, I have integrated your comments and drawn the
> > FLIP-215[1], I
> > > > > will
> > > > > > create another thread to vote for it.
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Aitozi.
> > > > > >
> > > > > >
> > > > > > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> > > > > >
> > > > > > > Hi Biao Geng:
> > > > > > >
> > > > > > >    Thanks for your feedback, I'm +1 to go with option#2. It's a
> > > good
> > > > > > > point that
> > > > > > >
> > > > > > > we should improve the error message debugging for the session
> > job,
> > > I
> > > > > > > think
> > > > > > >
> > > > > > > it can be a follow up work as an improvement after we support
> the
> > > > > session
> > > > > > > job operation.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Aitozi.
> > > > > > >
> > > > > > >
> > > > > > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > > > > > >
> > > > > > >> Thanks Aitozi for the work!
> > > > > > >>
> > > > > > >> I lean to option#2 of using JarRunHeaders with uber job jar as
> > > well.
> > > > > As
> > > > > > >> Yang said, the user defined dependencies may be better
> supported
> > > in
> > > > > > >> upstream flink.
> > > > > > >> A follow-up thought: I think we should care the  potential
> > > influence
> > > > > on
> > > > > > >> user experiences: as the job graph is generated in JM, when
> the
> > > > > > generation
> > > > > > >> fails due to some issues in the main() method, we should do
> some
> > > > work
> > > > > on
> > > > > > >> showing such error messages in this proposal or the later k8s
> > > > operator
> > > > > > >> implementation.  Reason for this question is that if users
> > submit
> > > > many
> > > > > > jobs
> > > > > > >> to one same session cluster, it may be not easy for them to
> find
> > > > > > relevant
> > > > > > >> error logs about main() method of a specific job. The
> > FLINK-25715
> > > > > could
> > > > > > >> help us later.
> > > > > > >>
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Biao Geng
> > > > > > >>
> > > > > > >>
> > > > > > >> 发件人: Aitozi <gj...@gmail.com>
> > > > > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > > > > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > > > > > >> 主题: Re: [DISCUSS] Support the session job management in
> > kubernetes
> > > > > > >> operator
> > > > > > >> Hi Yang Wang
> > > > > > >>     Thanks for your feedback, Provide the local and http
> > > > > implementation
> > > > > > >> for
> > > > > > >> the first version makes sense to me.
> > > > > > >> +1 for it.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Aitozi
> > > > > > >>
> > > > > > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > > > > > >>
> > > > > > >> > # How to download the user jars
> > > > > > >> > I agree with Gyula that it will be a burden if we bundle the
> > > flink
> > > > > > >> > filesystem dependencies in the operator image.
> > > > > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > > > > >> > flink-kubernetes-operator. By default, we provide the local
> > and
> > > > http
> > > > > > >> > implementation,
> > > > > > >> > which means we could get the user jars from local files or
> > HTTP
> > > > > URLs.
> > > > > > >> Flink
> > > > > > >> > filesystem support could be done as a follow-up based on the
> > > > > feedback.
> > > > > > >> >
> > > > > > >> > If the user wants to use the local implementation, they need
> > to
> > > > > mount
> > > > > > a
> > > > > > >> > PV(aka persist volume) to the operator first and then put
> > their
> > > > jars
> > > > > > >> into
> > > > > > >> > the PV.
> > > > > > >> >
> > > > > > >> > # How to talk to session JobManager to submit the job
> > > > > > >> > After more consideration, I also prefer the second approach,
> > via
> > > > > REST
> > > > > > >> API
> > > > > > >> > /jars/:jarid/run. If we have strong requirements to support
> > > > > > dependencies
> > > > > > >> > jars and
> > > > > > >> > artifacts, we could try to support this in the upstream
> > project.
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Yang
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > > > > > >> >
> > > > > > >> > > Hi Gyula
> > > > > > >> > >     Thanks for your quick response. Regarding the
> different
> > > > > > >> filesystems
> > > > > > >> > > dependency,
> > > > > > >> > > I think we can make it optional and pluggable, and let it
> > > choose
> > > > > by
> > > > > > >> user
> > > > > > >> > > when building
> > > > > > >> > > their operator image. Users can build their image from the
> > > base
> > > > > > >> operator
> > > > > > >> > > image and
> > > > > > >> > > add filesystem dependency they want to use to it. BTW, we
> > can
> > > > > > support
> > > > > > >> the
> > > > > > >> > > http URI
> > > > > > >> > > by default.
> > > > > > >> > >
> > > > > > >> > > Thanks,
> > > > > > >> > > Aitozi.
> > > > > > >> > >
> > > > > > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > > > > > >> > >
> > > > > > >> > > > Thank you Aitozi!
> > > > > > >> > > >
> > > > > > >> > > > I think this will be a very nice (and simple) addition
> to
> > > > enable
> > > > > > >> these
> > > > > > >> > > > use-cases.
> > > > > > >> > > >
> > > > > > >> > > > I have 2 comments regarding the proposal:
> > > > > > >> > > >
> > > > > > >> > > > 1. I think if we want to support different filesystems
> to
> > > > > download
> > > > > > >> jars
> > > > > > >> > > > from, we probably need some clever ways to add external
> > > > operator
> > > > > > >> > > > dependencies (jars, configs).
> > > > > > >> > > > I would prefer not to bundle them into the base operator
> > > > image.
> > > > > > >> > > >
> > > > > > >> > > > 2. I think we should avoid creating the jobgraphs on the
> > > > > operator
> > > > > > >> side
> > > > > > >> > > and
> > > > > > >> > > > use the jar upload/run rest api instead as you
> suggested.
> > > This
> > > > > > will
> > > > > > >> > avoid
> > > > > > >> > > > flink version and dependency conflicts.
> > > > > > >> > > >
> > > > > > >> > > > Cheers,
> > > > > > >> > > > Gyula
> > > > > > >> > > >
> > > > > > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <
> > > gjying1314@gmail.com>
> > > > > > >> wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi Guys:
> > > > > > >> > > > >
> > > > > > >> > > > >     I would like to open a discussion for support
> > session
> > > > job
> > > > > > >> > > management
> > > > > > >> > > > in
> > > > > > >> > > > > kubernetes operator. It’s intended to enhance the
> > > > > > >> > > > flink-kubernetes-operator
> > > > > > >> > > > > to manage the session job with k8s tooling. I have
> > drafted
> > > > the
> > > > > > >> design
> > > > > > >> > > > > doc[1]. Please refer to it and give me some feedback .
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > [1]
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > > > > >> <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > > > > > >> >
> > > > > > >> > > > >
> > > > > > >> > > > > Best,
> > > > > > >> > > > >
> > > > > > >> > > > > Aitozi.
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

Hi Thomas:

    Thanks for your valuable question. Let’s make the relationship between
the session deployment and the jobs more clear.

IMO, the session deployment and jobs interact in these situations:

- Create the session job. Then FlinkSessionJobController will wait for the
session cluster ready then submit the job. The look up key is namespace and
clusterId.

- Delete the session job. Then it will cancel the current session job.

- Delete the session deployment. It will have to delete the session job
first, we could set the ownerference of the FlinkSessionJob to let the
Kubernetes trigger the cleanup session jobs before removing the session
deployment.

- Upgrade the session deployment. It will be a critical part, because it
will affect all the session jobs. We should suspend the job first and then
upgrade the session cluster. So I tend to validate that all the jobs are
suspended and then perform the session cluster upgrade. After upgrade then
change the session jobs to running manually.

What do you think about this? If there is no objection, I will clarify it
in the FLIP doc.


Besides, sorry for the rough vote and discussion process. It's my first
time driving this, I will keep that in mind next time :)
Best,
Aitozi.

Yang Wang <da...@gmail.com> 于2022年3月22日周二 10:11写道：

> I think the session cluster could not be deleted unless all the running
> jobs have finished or cancelled. I agree this should be clarified in the
> FLIP.
>
> Best,
> Yang
>
> Thomas Weise <th...@apache.org> 于2022年3月22日周二 09:26写道：
>
> > Hi Aitozi,
> >
> > Thanks for the proposal. Can you please clarify in the FLIP the
> > relationship between the session deployment and the jobs that depend on
> it?
> > Will, for example, the operator ensure that the individual jobs are
> > deleted when the underlying cluster is deleted?
> >
> > Side note: When the discussion thread started 5 days ago and a FLIP vote
> > was started 2 days later and there is also a weekend included, then this
> is
> > probably on the short side for broader feedback.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Fri, Mar 18, 2022 at 4:01 AM Yang Wang <da...@gmail.com> wrote:
> >
> > > Great work. Since we are introducing a new public API, it deserves a
> > FLIP.
> > > And the FLIP will help the later contributors catch up soon.
> > >
> > > Best,
> > > Yang
> > >
> > > Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：
> > >
> > > > Thank Aitozi, a FLIP might be an overkill at this point but no harm
> in
> > > > voting on it anyways :)
> > > >
> > > > Looks good!
> > > >
> > > > Gyula
> > > >
> > > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com>
> wrote:
> > > >
> > > > > Hi Guys:
> > > > >
> > > > >     FYI, I have integrated your comments and drawn the
> FLIP-215[1], I
> > > > will
> > > > > create another thread to vote for it.
> > > > >
> > > > > [1]:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > > >
> > > > > Best,
> > > > >
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> > > > >
> > > > > > Hi Biao Geng:
> > > > > >
> > > > > >    Thanks for your feedback, I'm +1 to go with option#2. It's a
> > good
> > > > > > point that
> > > > > >
> > > > > > we should improve the error message debugging for the session
> job,
> > I
> > > > > > think
> > > > > >
> > > > > > it can be a follow up work as an improvement after we support the
> > > > session
> > > > > > job operation.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Aitozi.
> > > > > >
> > > > > >
> > > > > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > > > > >
> > > > > >> Thanks Aitozi for the work!
> > > > > >>
> > > > > >> I lean to option#2 of using JarRunHeaders with uber job jar as
> > well.
> > > > As
> > > > > >> Yang said, the user defined dependencies may be better supported
> > in
> > > > > >> upstream flink.
> > > > > >> A follow-up thought: I think we should care the  potential
> > influence
> > > > on
> > > > > >> user experiences: as the job graph is generated in JM, when the
> > > > > generation
> > > > > >> fails due to some issues in the main() method, we should do some
> > > work
> > > > on
> > > > > >> showing such error messages in this proposal or the later k8s
> > > operator
> > > > > >> implementation.  Reason for this question is that if users
> submit
> > > many
> > > > > jobs
> > > > > >> to one same session cluster, it may be not easy for them to find
> > > > > relevant
> > > > > >> error logs about main() method of a specific job. The
> FLINK-25715
> > > > could
> > > > > >> help us later.
> > > > > >>
> > > > > >>
> > > > > >> Best,
> > > > > >> Biao Geng
> > > > > >>
> > > > > >>
> > > > > >> 发件人: Aitozi <gj...@gmail.com>
> > > > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > > > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > > > > >> 主题: Re: [DISCUSS] Support the session job management in
> kubernetes
> > > > > >> operator
> > > > > >> Hi Yang Wang
> > > > > >>     Thanks for your feedback, Provide the local and http
> > > > implementation
> > > > > >> for
> > > > > >> the first version makes sense to me.
> > > > > >> +1 for it.
> > > > > >>
> > > > > >> Best,
> > > > > >> Aitozi
> > > > > >>
> > > > > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > > > > >>
> > > > > >> > # How to download the user jars
> > > > > >> > I agree with Gyula that it will be a burden if we bundle the
> > flink
> > > > > >> > filesystem dependencies in the operator image.
> > > > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > > > >> > flink-kubernetes-operator. By default, we provide the local
> and
> > > http
> > > > > >> > implementation,
> > > > > >> > which means we could get the user jars from local files or
> HTTP
> > > > URLs.
> > > > > >> Flink
> > > > > >> > filesystem support could be done as a follow-up based on the
> > > > feedback.
> > > > > >> >
> > > > > >> > If the user wants to use the local implementation, they need
> to
> > > > mount
> > > > > a
> > > > > >> > PV(aka persist volume) to the operator first and then put
> their
> > > jars
> > > > > >> into
> > > > > >> > the PV.
> > > > > >> >
> > > > > >> > # How to talk to session JobManager to submit the job
> > > > > >> > After more consideration, I also prefer the second approach,
> via
> > > > REST
> > > > > >> API
> > > > > >> > /jars/:jarid/run. If we have strong requirements to support
> > > > > dependencies
> > > > > >> > jars and
> > > > > >> > artifacts, we could try to support this in the upstream
> project.
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Yang
> > > > > >> >
> > > > > >> >
> > > > > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > > > > >> >
> > > > > >> > > Hi Gyula
> > > > > >> > >     Thanks for your quick response. Regarding the different
> > > > > >> filesystems
> > > > > >> > > dependency,
> > > > > >> > > I think we can make it optional and pluggable, and let it
> > choose
> > > > by
> > > > > >> user
> > > > > >> > > when building
> > > > > >> > > their operator image. Users can build their image from the
> > base
> > > > > >> operator
> > > > > >> > > image and
> > > > > >> > > add filesystem dependency they want to use to it. BTW, we
> can
> > > > > support
> > > > > >> the
> > > > > >> > > http URI
> > > > > >> > > by default.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > > Aitozi.
> > > > > >> > >
> > > > > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > > > > >> > >
> > > > > >> > > > Thank you Aitozi!
> > > > > >> > > >
> > > > > >> > > > I think this will be a very nice (and simple) addition to
> > > enable
> > > > > >> these
> > > > > >> > > > use-cases.
> > > > > >> > > >
> > > > > >> > > > I have 2 comments regarding the proposal:
> > > > > >> > > >
> > > > > >> > > > 1. I think if we want to support different filesystems to
> > > > download
> > > > > >> jars
> > > > > >> > > > from, we probably need some clever ways to add external
> > > operator
> > > > > >> > > > dependencies (jars, configs).
> > > > > >> > > > I would prefer not to bundle them into the base operator
> > > image.
> > > > > >> > > >
> > > > > >> > > > 2. I think we should avoid creating the jobgraphs on the
> > > > operator
> > > > > >> side
> > > > > >> > > and
> > > > > >> > > > use the jar upload/run rest api instead as you suggested.
> > This
> > > > > will
> > > > > >> > avoid
> > > > > >> > > > flink version and dependency conflicts.
> > > > > >> > > >
> > > > > >> > > > Cheers,
> > > > > >> > > > Gyula
> > > > > >> > > >
> > > > > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <
> > gjying1314@gmail.com>
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Guys:
> > > > > >> > > > >
> > > > > >> > > > >     I would like to open a discussion for support
> session
> > > job
> > > > > >> > > management
> > > > > >> > > > in
> > > > > >> > > > > kubernetes operator. It’s intended to enhance the
> > > > > >> > > > flink-kubernetes-operator
> > > > > >> > > > > to manage the session job with k8s tooling. I have
> drafted
> > > the
> > > > > >> design
> > > > > >> > > > > doc[1]. Please refer to it and give me some feedback .
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > [1]
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > > > > >> >
> > > > > >> > > > >
> > > > > >> > > > > Best,
> > > > > >> > > > >
> > > > > >> > > > > Aitozi.
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Yang Wang <da...@gmail.com>.

I think the session cluster could not be deleted unless all the running
jobs have finished or cancelled. I agree this should be clarified in the
FLIP.

Best,
Yang

Thomas Weise <th...@apache.org> 于2022年3月22日周二 09:26写道：

> Hi Aitozi,
>
> Thanks for the proposal. Can you please clarify in the FLIP the
> relationship between the session deployment and the jobs that depend on it?
> Will, for example, the operator ensure that the individual jobs are
> deleted when the underlying cluster is deleted?
>
> Side note: When the discussion thread started 5 days ago and a FLIP vote
> was started 2 days later and there is also a weekend included, then this is
> probably on the short side for broader feedback.
>
> Thanks,
> Thomas
>
>
> On Fri, Mar 18, 2022 at 4:01 AM Yang Wang <da...@gmail.com> wrote:
>
> > Great work. Since we are introducing a new public API, it deserves a
> FLIP.
> > And the FLIP will help the later contributors catch up soon.
> >
> > Best,
> > Yang
> >
> > Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：
> >
> > > Thank Aitozi, a FLIP might be an overkill at this point but no harm in
> > > voting on it anyways :)
> > >
> > > Looks good!
> > >
> > > Gyula
> > >
> > > On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com> wrote:
> > >
> > > > Hi Guys:
> > > >
> > > >     FYI, I have integrated your comments and drawn the FLIP-215[1], I
> > > will
> > > > create another thread to vote for it.
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > > >
> > > > Best,
> > > >
> > > > Aitozi.
> > > >
> > > >
> > > > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> > > >
> > > > > Hi Biao Geng:
> > > > >
> > > > >    Thanks for your feedback, I'm +1 to go with option#2. It's a
> good
> > > > > point that
> > > > >
> > > > > we should improve the error message debugging for the session job,
> I
> > > > > think
> > > > >
> > > > > it can be a follow up work as an improvement after we support the
> > > session
> > > > > job operation.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Aitozi.
> > > > >
> > > > >
> > > > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > > > >
> > > > >> Thanks Aitozi for the work!
> > > > >>
> > > > >> I lean to option#2 of using JarRunHeaders with uber job jar as
> well.
> > > As
> > > > >> Yang said, the user defined dependencies may be better supported
> in
> > > > >> upstream flink.
> > > > >> A follow-up thought: I think we should care the  potential
> influence
> > > on
> > > > >> user experiences: as the job graph is generated in JM, when the
> > > > generation
> > > > >> fails due to some issues in the main() method, we should do some
> > work
> > > on
> > > > >> showing such error messages in this proposal or the later k8s
> > operator
> > > > >> implementation.  Reason for this question is that if users submit
> > many
> > > > jobs
> > > > >> to one same session cluster, it may be not easy for them to find
> > > > relevant
> > > > >> error logs about main() method of a specific job. The FLINK-25715
> > > could
> > > > >> help us later.
> > > > >>
> > > > >>
> > > > >> Best,
> > > > >> Biao Geng
> > > > >>
> > > > >>
> > > > >> 发件人: Aitozi <gj...@gmail.com>
> > > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > > > >> 主题: Re: [DISCUSS] Support the session job management in kubernetes
> > > > >> operator
> > > > >> Hi Yang Wang
> > > > >>     Thanks for your feedback, Provide the local and http
> > > implementation
> > > > >> for
> > > > >> the first version makes sense to me.
> > > > >> +1 for it.
> > > > >>
> > > > >> Best,
> > > > >> Aitozi
> > > > >>
> > > > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > > > >>
> > > > >> > # How to download the user jars
> > > > >> > I agree with Gyula that it will be a burden if we bundle the
> flink
> > > > >> > filesystem dependencies in the operator image.
> > > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > > >> > flink-kubernetes-operator. By default, we provide the local and
> > http
> > > > >> > implementation,
> > > > >> > which means we could get the user jars from local files or HTTP
> > > URLs.
> > > > >> Flink
> > > > >> > filesystem support could be done as a follow-up based on the
> > > feedback.
> > > > >> >
> > > > >> > If the user wants to use the local implementation, they need to
> > > mount
> > > > a
> > > > >> > PV(aka persist volume) to the operator first and then put their
> > jars
> > > > >> into
> > > > >> > the PV.
> > > > >> >
> > > > >> > # How to talk to session JobManager to submit the job
> > > > >> > After more consideration, I also prefer the second approach, via
> > > REST
> > > > >> API
> > > > >> > /jars/:jarid/run. If we have strong requirements to support
> > > > dependencies
> > > > >> > jars and
> > > > >> > artifacts, we could try to support this in the upstream project.
> > > > >> >
> > > > >> > Best,
> > > > >> > Yang
> > > > >> >
> > > > >> >
> > > > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > > > >> >
> > > > >> > > Hi Gyula
> > > > >> > >     Thanks for your quick response. Regarding the different
> > > > >> filesystems
> > > > >> > > dependency,
> > > > >> > > I think we can make it optional and pluggable, and let it
> choose
> > > by
> > > > >> user
> > > > >> > > when building
> > > > >> > > their operator image. Users can build their image from the
> base
> > > > >> operator
> > > > >> > > image and
> > > > >> > > add filesystem dependency they want to use to it. BTW, we can
> > > > support
> > > > >> the
> > > > >> > > http URI
> > > > >> > > by default.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Aitozi.
> > > > >> > >
> > > > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > > > >> > >
> > > > >> > > > Thank you Aitozi!
> > > > >> > > >
> > > > >> > > > I think this will be a very nice (and simple) addition to
> > enable
> > > > >> these
> > > > >> > > > use-cases.
> > > > >> > > >
> > > > >> > > > I have 2 comments regarding the proposal:
> > > > >> > > >
> > > > >> > > > 1. I think if we want to support different filesystems to
> > > download
> > > > >> jars
> > > > >> > > > from, we probably need some clever ways to add external
> > operator
> > > > >> > > > dependencies (jars, configs).
> > > > >> > > > I would prefer not to bundle them into the base operator
> > image.
> > > > >> > > >
> > > > >> > > > 2. I think we should avoid creating the jobgraphs on the
> > > operator
> > > > >> side
> > > > >> > > and
> > > > >> > > > use the jar upload/run rest api instead as you suggested.
> This
> > > > will
> > > > >> > avoid
> > > > >> > > > flink version and dependency conflicts.
> > > > >> > > >
> > > > >> > > > Cheers,
> > > > >> > > > Gyula
> > > > >> > > >
> > > > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <
> gjying1314@gmail.com>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > Hi Guys:
> > > > >> > > > >
> > > > >> > > > >     I would like to open a discussion for support session
> > job
> > > > >> > > management
> > > > >> > > > in
> > > > >> > > > > kubernetes operator. It’s intended to enhance the
> > > > >> > > > flink-kubernetes-operator
> > > > >> > > > > to manage the session job with k8s tooling. I have drafted
> > the
> > > > >> design
> > > > >> > > > > doc[1]. Please refer to it and give me some feedback .
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > [1]
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > > > >> >
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > >
> > > > >> > > > > Aitozi.
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Thomas Weise <th...@apache.org>.

Hi Aitozi,

Thanks for the proposal. Can you please clarify in the FLIP the
relationship between the session deployment and the jobs that depend on it?
Will, for example, the operator ensure that the individual jobs are
deleted when the underlying cluster is deleted?

Side note: When the discussion thread started 5 days ago and a FLIP vote
was started 2 days later and there is also a weekend included, then this is
probably on the short side for broader feedback.

Thanks,
Thomas


On Fri, Mar 18, 2022 at 4:01 AM Yang Wang <da...@gmail.com> wrote:

> Great work. Since we are introducing a new public API, it deserves a FLIP.
> And the FLIP will help the later contributors catch up soon.
>
> Best,
> Yang
>
> Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：
>
> > Thank Aitozi, a FLIP might be an overkill at this point but no harm in
> > voting on it anyways :)
> >
> > Looks good!
> >
> > Gyula
> >
> > On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com> wrote:
> >
> > > Hi Guys:
> > >
> > >     FYI, I have integrated your comments and drawn the FLIP-215[1], I
> > will
> > > create another thread to vote for it.
> > >
> > > [1]:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> > >
> > > Best,
> > >
> > > Aitozi.
> > >
> > >
> > > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> > >
> > > > Hi Biao Geng:
> > > >
> > > >    Thanks for your feedback, I'm +1 to go with option#2. It's a good
> > > > point that
> > > >
> > > > we should improve the error message debugging for the session job, I
> > > > think
> > > >
> > > > it can be a follow up work as an improvement after we support the
> > session
> > > > job operation.
> > > >
> > > >
> > > >
> > > > Best,
> > > >
> > > > Aitozi.
> > > >
> > > >
> > > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > > >
> > > >> Thanks Aitozi for the work!
> > > >>
> > > >> I lean to option#2 of using JarRunHeaders with uber job jar as well.
> > As
> > > >> Yang said, the user defined dependencies may be better supported in
> > > >> upstream flink.
> > > >> A follow-up thought: I think we should care the  potential influence
> > on
> > > >> user experiences: as the job graph is generated in JM, when the
> > > generation
> > > >> fails due to some issues in the main() method, we should do some
> work
> > on
> > > >> showing such error messages in this proposal or the later k8s
> operator
> > > >> implementation.  Reason for this question is that if users submit
> many
> > > jobs
> > > >> to one same session cluster, it may be not easy for them to find
> > > relevant
> > > >> error logs about main() method of a specific job. The FLINK-25715
> > could
> > > >> help us later.
> > > >>
> > > >>
> > > >> Best,
> > > >> Biao Geng
> > > >>
> > > >>
> > > >> 发件人: Aitozi <gj...@gmail.com>
> > > >> 日期: 星期三, 2022年3月16日 下午5:19
> > > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > > >> 主题: Re: [DISCUSS] Support the session job management in kubernetes
> > > >> operator
> > > >> Hi Yang Wang
> > > >>     Thanks for your feedback, Provide the local and http
> > implementation
> > > >> for
> > > >> the first version makes sense to me.
> > > >> +1 for it.
> > > >>
> > > >> Best,
> > > >> Aitozi
> > > >>
> > > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > > >>
> > > >> > # How to download the user jars
> > > >> > I agree with Gyula that it will be a burden if we bundle the flink
> > > >> > filesystem dependencies in the operator image.
> > > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > > >> > flink-kubernetes-operator. By default, we provide the local and
> http
> > > >> > implementation,
> > > >> > which means we could get the user jars from local files or HTTP
> > URLs.
> > > >> Flink
> > > >> > filesystem support could be done as a follow-up based on the
> > feedback.
> > > >> >
> > > >> > If the user wants to use the local implementation, they need to
> > mount
> > > a
> > > >> > PV(aka persist volume) to the operator first and then put their
> jars
> > > >> into
> > > >> > the PV.
> > > >> >
> > > >> > # How to talk to session JobManager to submit the job
> > > >> > After more consideration, I also prefer the second approach, via
> > REST
> > > >> API
> > > >> > /jars/:jarid/run. If we have strong requirements to support
> > > dependencies
> > > >> > jars and
> > > >> > artifacts, we could try to support this in the upstream project.
> > > >> >
> > > >> > Best,
> > > >> > Yang
> > > >> >
> > > >> >
> > > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > > >> >
> > > >> > > Hi Gyula
> > > >> > >     Thanks for your quick response. Regarding the different
> > > >> filesystems
> > > >> > > dependency,
> > > >> > > I think we can make it optional and pluggable, and let it choose
> > by
> > > >> user
> > > >> > > when building
> > > >> > > their operator image. Users can build their image from the base
> > > >> operator
> > > >> > > image and
> > > >> > > add filesystem dependency they want to use to it. BTW, we can
> > > support
> > > >> the
> > > >> > > http URI
> > > >> > > by default.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Aitozi.
> > > >> > >
> > > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > > >> > >
> > > >> > > > Thank you Aitozi!
> > > >> > > >
> > > >> > > > I think this will be a very nice (and simple) addition to
> enable
> > > >> these
> > > >> > > > use-cases.
> > > >> > > >
> > > >> > > > I have 2 comments regarding the proposal:
> > > >> > > >
> > > >> > > > 1. I think if we want to support different filesystems to
> > download
> > > >> jars
> > > >> > > > from, we probably need some clever ways to add external
> operator
> > > >> > > > dependencies (jars, configs).
> > > >> > > > I would prefer not to bundle them into the base operator
> image.
> > > >> > > >
> > > >> > > > 2. I think we should avoid creating the jobgraphs on the
> > operator
> > > >> side
> > > >> > > and
> > > >> > > > use the jar upload/run rest api instead as you suggested. This
> > > will
> > > >> > avoid
> > > >> > > > flink version and dependency conflicts.
> > > >> > > >
> > > >> > > > Cheers,
> > > >> > > > Gyula
> > > >> > > >
> > > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Hi Guys:
> > > >> > > > >
> > > >> > > > >     I would like to open a discussion for support session
> job
> > > >> > > management
> > > >> > > > in
> > > >> > > > > kubernetes operator. It’s intended to enhance the
> > > >> > > > flink-kubernetes-operator
> > > >> > > > > to manage the session job with k8s tooling. I have drafted
> the
> > > >> design
> > > >> > > > > doc[1]. Please refer to it and give me some feedback .
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > [1]
> > > >> > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > >> <
> > > >>
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > > >> >
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > >
> > > >> > > > > Aitozi.
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Yang Wang <da...@gmail.com>.

Great work. Since we are introducing a new public API, it deserves a FLIP.
And the FLIP will help the later contributors catch up soon.

Best,
Yang

Gyula Fóra <gy...@gmail.com> 于2022年3月18日周五 18:11写道：

> Thank Aitozi, a FLIP might be an overkill at this point but no harm in
> voting on it anyways :)
>
> Looks good!
>
> Gyula
>
> On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com> wrote:
>
> > Hi Guys:
> >
> >     FYI, I have integrated your comments and drawn the FLIP-215[1], I
> will
> > create another thread to vote for it.
> >
> > [1]:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
> >
> > Best,
> >
> > Aitozi.
> >
> >
> > Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
> >
> > > Hi Biao Geng:
> > >
> > >    Thanks for your feedback, I'm +1 to go with option#2. It's a good
> > > point that
> > >
> > > we should improve the error message debugging for the session job, I
> > > think
> > >
> > > it can be a follow up work as an improvement after we support the
> session
> > > job operation.
> > >
> > >
> > >
> > > Best,
> > >
> > > Aitozi.
> > >
> > >
> > > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> > >
> > >> Thanks Aitozi for the work!
> > >>
> > >> I lean to option#2 of using JarRunHeaders with uber job jar as well.
> As
> > >> Yang said, the user defined dependencies may be better supported in
> > >> upstream flink.
> > >> A follow-up thought: I think we should care the  potential influence
> on
> > >> user experiences: as the job graph is generated in JM, when the
> > generation
> > >> fails due to some issues in the main() method, we should do some work
> on
> > >> showing such error messages in this proposal or the later k8s operator
> > >> implementation.  Reason for this question is that if users submit many
> > jobs
> > >> to one same session cluster, it may be not easy for them to find
> > relevant
> > >> error logs about main() method of a specific job. The FLINK-25715
> could
> > >> help us later.
> > >>
> > >>
> > >> Best,
> > >> Biao Geng
> > >>
> > >>
> > >> 发件人: Aitozi <gj...@gmail.com>
> > >> 日期: 星期三, 2022年3月16日 下午5:19
> > >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> > >> 主题: Re: [DISCUSS] Support the session job management in kubernetes
> > >> operator
> > >> Hi Yang Wang
> > >>     Thanks for your feedback, Provide the local and http
> implementation
> > >> for
> > >> the first version makes sense to me.
> > >> +1 for it.
> > >>
> > >> Best,
> > >> Aitozi
> > >>
> > >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> > >>
> > >> > # How to download the user jars
> > >> > I agree with Gyula that it will be a burden if we bundle the flink
> > >> > filesystem dependencies in the operator image.
> > >> > Maybe we could have a *ArtifactFetcher* interface in the
> > >> > flink-kubernetes-operator. By default, we provide the local and http
> > >> > implementation,
> > >> > which means we could get the user jars from local files or HTTP
> URLs.
> > >> Flink
> > >> > filesystem support could be done as a follow-up based on the
> feedback.
> > >> >
> > >> > If the user wants to use the local implementation, they need to
> mount
> > a
> > >> > PV(aka persist volume) to the operator first and then put their jars
> > >> into
> > >> > the PV.
> > >> >
> > >> > # How to talk to session JobManager to submit the job
> > >> > After more consideration, I also prefer the second approach, via
> REST
> > >> API
> > >> > /jars/:jarid/run. If we have strong requirements to support
> > dependencies
> > >> > jars and
> > >> > artifacts, we could try to support this in the upstream project.
> > >> >
> > >> > Best,
> > >> > Yang
> > >> >
> > >> >
> > >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> > >> >
> > >> > > Hi Gyula
> > >> > >     Thanks for your quick response. Regarding the different
> > >> filesystems
> > >> > > dependency,
> > >> > > I think we can make it optional and pluggable, and let it choose
> by
> > >> user
> > >> > > when building
> > >> > > their operator image. Users can build their image from the base
> > >> operator
> > >> > > image and
> > >> > > add filesystem dependency they want to use to it. BTW, we can
> > support
> > >> the
> > >> > > http URI
> > >> > > by default.
> > >> > >
> > >> > > Thanks,
> > >> > > Aitozi.
> > >> > >
> > >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > >> > >
> > >> > > > Thank you Aitozi!
> > >> > > >
> > >> > > > I think this will be a very nice (and simple) addition to enable
> > >> these
> > >> > > > use-cases.
> > >> > > >
> > >> > > > I have 2 comments regarding the proposal:
> > >> > > >
> > >> > > > 1. I think if we want to support different filesystems to
> download
> > >> jars
> > >> > > > from, we probably need some clever ways to add external operator
> > >> > > > dependencies (jars, configs).
> > >> > > > I would prefer not to bundle them into the base operator image.
> > >> > > >
> > >> > > > 2. I think we should avoid creating the jobgraphs on the
> operator
> > >> side
> > >> > > and
> > >> > > > use the jar upload/run rest api instead as you suggested. This
> > will
> > >> > avoid
> > >> > > > flink version and dependency conflicts.
> > >> > > >
> > >> > > > Cheers,
> > >> > > > Gyula
> > >> > > >
> > >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Guys:
> > >> > > > >
> > >> > > > >     I would like to open a discussion for support session job
> > >> > > management
> > >> > > > in
> > >> > > > > kubernetes operator. It’s intended to enhance the
> > >> > > > flink-kubernetes-operator
> > >> > > > > to manage the session job with k8s tooling. I have drafted the
> > >> design
> > >> > > > > doc[1]. Please refer to it and give me some feedback .
> > >> > > > >
> > >> > > > >
> > >> > > > > [1]
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > >> <
> > >>
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> > >> >
> > >> > > > >
> > >> > > > > Best,
> > >> > > > >
> > >> > > > > Aitozi.
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Gyula Fóra <gy...@gmail.com>.

Thank Aitozi, a FLIP might be an overkill at this point but no harm in
voting on it anyways :)

Looks good!

Gyula

On Fri, Mar 18, 2022 at 10:25 AM Aitozi <gj...@gmail.com> wrote:

> Hi Guys:
>
>     FYI, I have integrated your comments and drawn the FLIP-215[1], I will
> create another thread to vote for it.
>
> [1]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator
>
> Best,
>
> Aitozi.
>
>
> Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：
>
> > Hi Biao Geng:
> >
> >    Thanks for your feedback, I'm +1 to go with option#2. It's a good
> > point that
> >
> > we should improve the error message debugging for the session job, I
> > think
> >
> > it can be a follow up work as an improvement after we support the session
> > job operation.
> >
> >
> >
> > Best,
> >
> > Aitozi.
> >
> >
> > Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
> >
> >> Thanks Aitozi for the work!
> >>
> >> I lean to option#2 of using JarRunHeaders with uber job jar as well. As
> >> Yang said, the user defined dependencies may be better supported in
> >> upstream flink.
> >> A follow-up thought: I think we should care the  potential influence on
> >> user experiences: as the job graph is generated in JM, when the
> generation
> >> fails due to some issues in the main() method, we should do some work on
> >> showing such error messages in this proposal or the later k8s operator
> >> implementation.  Reason for this question is that if users submit many
> jobs
> >> to one same session cluster, it may be not easy for them to find
> relevant
> >> error logs about main() method of a specific job. The FLINK-25715 could
> >> help us later.
> >>
> >>
> >> Best,
> >> Biao Geng
> >>
> >>
> >> 发件人: Aitozi <gj...@gmail.com>
> >> 日期: 星期三, 2022年3月16日 下午5:19
> >> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> >> 主题: Re: [DISCUSS] Support the session job management in kubernetes
> >> operator
> >> Hi Yang Wang
> >>     Thanks for your feedback, Provide the local and http implementation
> >> for
> >> the first version makes sense to me.
> >> +1 for it.
> >>
> >> Best,
> >> Aitozi
> >>
> >> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
> >>
> >> > # How to download the user jars
> >> > I agree with Gyula that it will be a burden if we bundle the flink
> >> > filesystem dependencies in the operator image.
> >> > Maybe we could have a *ArtifactFetcher* interface in the
> >> > flink-kubernetes-operator. By default, we provide the local and http
> >> > implementation,
> >> > which means we could get the user jars from local files or HTTP URLs.
> >> Flink
> >> > filesystem support could be done as a follow-up based on the feedback.
> >> >
> >> > If the user wants to use the local implementation, they need to mount
> a
> >> > PV(aka persist volume) to the operator first and then put their jars
> >> into
> >> > the PV.
> >> >
> >> > # How to talk to session JobManager to submit the job
> >> > After more consideration, I also prefer the second approach, via REST
> >> API
> >> > /jars/:jarid/run. If we have strong requirements to support
> dependencies
> >> > jars and
> >> > artifacts, we could try to support this in the upstream project.
> >> >
> >> > Best,
> >> > Yang
> >> >
> >> >
> >> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> >> >
> >> > > Hi Gyula
> >> > >     Thanks for your quick response. Regarding the different
> >> filesystems
> >> > > dependency,
> >> > > I think we can make it optional and pluggable, and let it choose by
> >> user
> >> > > when building
> >> > > their operator image. Users can build their image from the base
> >> operator
> >> > > image and
> >> > > add filesystem dependency they want to use to it. BTW, we can
> support
> >> the
> >> > > http URI
> >> > > by default.
> >> > >
> >> > > Thanks,
> >> > > Aitozi.
> >> > >
> >> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> >> > >
> >> > > > Thank you Aitozi!
> >> > > >
> >> > > > I think this will be a very nice (and simple) addition to enable
> >> these
> >> > > > use-cases.
> >> > > >
> >> > > > I have 2 comments regarding the proposal:
> >> > > >
> >> > > > 1. I think if we want to support different filesystems to download
> >> jars
> >> > > > from, we probably need some clever ways to add external operator
> >> > > > dependencies (jars, configs).
> >> > > > I would prefer not to bundle them into the base operator image.
> >> > > >
> >> > > > 2. I think we should avoid creating the jobgraphs on the operator
> >> side
> >> > > and
> >> > > > use the jar upload/run rest api instead as you suggested. This
> will
> >> > avoid
> >> > > > flink version and dependency conflicts.
> >> > > >
> >> > > > Cheers,
> >> > > > Gyula
> >> > > >
> >> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Hi Guys:
> >> > > > >
> >> > > > >     I would like to open a discussion for support session job
> >> > > management
> >> > > > in
> >> > > > > kubernetes operator. It’s intended to enhance the
> >> > > > flink-kubernetes-operator
> >> > > > > to manage the session job with k8s tooling. I have drafted the
> >> design
> >> > > > > doc[1]. Please refer to it and give me some feedback .
> >> > > > >
> >> > > > >
> >> > > > > [1]
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> >> <
> >>
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> >> >
> >> > > > >
> >> > > > > Best,
> >> > > > >
> >> > > > > Aitozi.
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

Hi Guys:

    FYI, I have integrated your comments and drawn the FLIP-215[1], I will
create another thread to vote for it.

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator

Best,

Aitozi.


Aitozi <gj...@gmail.com> 于2022年3月17日周四 11:16写道：

> Hi Biao Geng:
>
>    Thanks for your feedback, I'm +1 to go with option#2. It's a good
> point that
>
> we should improve the error message debugging for the session job, I
> think
>
> it can be a follow up work as an improvement after we support the session
> job operation.
>
>
>
> Best,
>
> Aitozi.
>
>
> Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：
>
>> Thanks Aitozi for the work!
>>
>> I lean to option#2 of using JarRunHeaders with uber job jar as well. As
>> Yang said, the user defined dependencies may be better supported in
>> upstream flink.
>> A follow-up thought: I think we should care the  potential influence on
>> user experiences: as the job graph is generated in JM, when the generation
>> fails due to some issues in the main() method, we should do some work on
>> showing such error messages in this proposal or the later k8s operator
>> implementation.  Reason for this question is that if users submit many jobs
>> to one same session cluster, it may be not easy for them to find relevant
>> error logs about main() method of a specific job. The FLINK-25715 could
>> help us later.
>>
>>
>> Best,
>> Biao Geng
>>
>>
>> 发件人: Aitozi <gj...@gmail.com>
>> 日期: 星期三, 2022年3月16日 下午5:19
>> 收件人: dev@flink.apache.org <de...@flink.apache.org>
>> 主题: Re: [DISCUSS] Support the session job management in kubernetes
>> operator
>> Hi Yang Wang
>>     Thanks for your feedback, Provide the local and http implementation
>> for
>> the first version makes sense to me.
>> +1 for it.
>>
>> Best,
>> Aitozi
>>
>> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
>>
>> > # How to download the user jars
>> > I agree with Gyula that it will be a burden if we bundle the flink
>> > filesystem dependencies in the operator image.
>> > Maybe we could have a *ArtifactFetcher* interface in the
>> > flink-kubernetes-operator. By default, we provide the local and http
>> > implementation,
>> > which means we could get the user jars from local files or HTTP URLs.
>> Flink
>> > filesystem support could be done as a follow-up based on the feedback.
>> >
>> > If the user wants to use the local implementation, they need to mount a
>> > PV(aka persist volume) to the operator first and then put their jars
>> into
>> > the PV.
>> >
>> > # How to talk to session JobManager to submit the job
>> > After more consideration, I also prefer the second approach, via REST
>> API
>> > /jars/:jarid/run. If we have strong requirements to support dependencies
>> > jars and
>> > artifacts, we could try to support this in the upstream project.
>> >
>> > Best,
>> > Yang
>> >
>> >
>> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
>> >
>> > > Hi Gyula
>> > >     Thanks for your quick response. Regarding the different
>> filesystems
>> > > dependency,
>> > > I think we can make it optional and pluggable, and let it choose by
>> user
>> > > when building
>> > > their operator image. Users can build their image from the base
>> operator
>> > > image and
>> > > add filesystem dependency they want to use to it. BTW, we can support
>> the
>> > > http URI
>> > > by default.
>> > >
>> > > Thanks,
>> > > Aitozi.
>> > >
>> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
>> > >
>> > > > Thank you Aitozi!
>> > > >
>> > > > I think this will be a very nice (and simple) addition to enable
>> these
>> > > > use-cases.
>> > > >
>> > > > I have 2 comments regarding the proposal:
>> > > >
>> > > > 1. I think if we want to support different filesystems to download
>> jars
>> > > > from, we probably need some clever ways to add external operator
>> > > > dependencies (jars, configs).
>> > > > I would prefer not to bundle them into the base operator image.
>> > > >
>> > > > 2. I think we should avoid creating the jobgraphs on the operator
>> side
>> > > and
>> > > > use the jar upload/run rest api instead as you suggested. This will
>> > avoid
>> > > > flink version and dependency conflicts.
>> > > >
>> > > > Cheers,
>> > > > Gyula
>> > > >
>> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi Guys:
>> > > > >
>> > > > >     I would like to open a discussion for support session job
>> > > management
>> > > > in
>> > > > > kubernetes operator. It’s intended to enhance the
>> > > > flink-kubernetes-operator
>> > > > > to manage the session job with k8s tooling. I have drafted the
>> design
>> > > > > doc[1]. Please refer to it and give me some feedback .
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
>> <
>> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
>> >
>> > > > >
>> > > > > Best,
>> > > > >
>> > > > > Aitozi.
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

Hi Biao Geng:

   Thanks for your feedback, I'm +1 to go with option#2. It's a good point
that

we should improve the error message debugging for the session job, I think

it can be a follow up work as an improvement after we support the session
job operation.



Best,

Aitozi.


Geng Biao <bi...@gmail.com> 于2022年3月17日周四 10:55写道：

> Thanks Aitozi for the work!
>
> I lean to option#2 of using JarRunHeaders with uber job jar as well. As
> Yang said, the user defined dependencies may be better supported in
> upstream flink.
> A follow-up thought: I think we should care the  potential influence on
> user experiences: as the job graph is generated in JM, when the generation
> fails due to some issues in the main() method, we should do some work on
> showing such error messages in this proposal or the later k8s operator
> implementation.  Reason for this question is that if users submit many jobs
> to one same session cluster, it may be not easy for them to find relevant
> error logs about main() method of a specific job. The FLINK-25715 could
> help us later.
>
>
> Best,
> Biao Geng
>
>
> 发件人: Aitozi <gj...@gmail.com>
> 日期: 星期三, 2022年3月16日 下午5:19
> 收件人: dev@flink.apache.org <de...@flink.apache.org>
> 主题: Re: [DISCUSS] Support the session job management in kubernetes operator
> Hi Yang Wang
>     Thanks for your feedback, Provide the local and http implementation for
> the first version makes sense to me.
> +1 for it.
>
> Best,
> Aitozi
>
> Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：
>
> > # How to download the user jars
> > I agree with Gyula that it will be a burden if we bundle the flink
> > filesystem dependencies in the operator image.
> > Maybe we could have a *ArtifactFetcher* interface in the
> > flink-kubernetes-operator. By default, we provide the local and http
> > implementation,
> > which means we could get the user jars from local files or HTTP URLs.
> Flink
> > filesystem support could be done as a follow-up based on the feedback.
> >
> > If the user wants to use the local implementation, they need to mount a
> > PV(aka persist volume) to the operator first and then put their jars into
> > the PV.
> >
> > # How to talk to session JobManager to submit the job
> > After more consideration, I also prefer the second approach, via REST API
> > /jars/:jarid/run. If we have strong requirements to support dependencies
> > jars and
> > artifacts, we could try to support this in the upstream project.
> >
> > Best,
> > Yang
> >
> >
> > Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
> >
> > > Hi Gyula
> > >     Thanks for your quick response. Regarding the different filesystems
> > > dependency,
> > > I think we can make it optional and pluggable, and let it choose by
> user
> > > when building
> > > their operator image. Users can build their image from the base
> operator
> > > image and
> > > add filesystem dependency they want to use to it. BTW, we can support
> the
> > > http URI
> > > by default.
> > >
> > > Thanks,
> > > Aitozi.
> > >
> > > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> > >
> > > > Thank you Aitozi!
> > > >
> > > > I think this will be a very nice (and simple) addition to enable
> these
> > > > use-cases.
> > > >
> > > > I have 2 comments regarding the proposal:
> > > >
> > > > 1. I think if we want to support different filesystems to download
> jars
> > > > from, we probably need some clever ways to add external operator
> > > > dependencies (jars, configs).
> > > > I would prefer not to bundle them into the base operator image.
> > > >
> > > > 2. I think we should avoid creating the jobgraphs on the operator
> side
> > > and
> > > > use the jar upload/run rest api instead as you suggested. This will
> > avoid
> > > > flink version and dependency conflicts.
> > > >
> > > > Cheers,
> > > > Gyula
> > > >
> > > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:
> > > >
> > > > > Hi Guys:
> > > > >
> > > > >     I would like to open a discussion for support session job
> > > management
> > > > in
> > > > > kubernetes operator. It’s intended to enhance the
> > > > flink-kubernetes-operator
> > > > > to manage the session job with k8s tooling. I have drafted the
> design
> > > > > doc[1]. Please refer to it and give me some feedback .
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> <
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit
> >
> > > > >
> > > > > Best,
> > > > >
> > > > > Aitozi.
> > > > >
> > > >
> > >
> >
>

答复: [DISCUSS] Support the session job management in kubernetes operator

Posted by Geng Biao <bi...@gmail.com>.

Thanks Aitozi for the work!

I lean to option#2 of using JarRunHeaders with uber job jar as well. As Yang said, the user defined dependencies may be better supported in upstream flink.
A follow-up thought: I think we should care the  potential influence on user experiences: as the job graph is generated in JM, when the generation fails due to some issues in the main() method, we should do some work on showing such error messages in this proposal or the later k8s operator implementation.  Reason for this question is that if users submit many jobs to one same session cluster, it may be not easy for them to find relevant error logs about main() method of a specific job. The FLINK-25715 could help us later.


Best,
Biao Geng


发件人: Aitozi <gj...@gmail.com>
日期: 星期三, 2022年3月16日 下午5:19
收件人: dev@flink.apache.org <de...@flink.apache.org>
主题: Re: [DISCUSS] Support the session job management in kubernetes operator
Hi Yang Wang
    Thanks for your feedback, Provide the local and http implementation for
the first version makes sense to me.
+1 for it.

Best,
Aitozi

Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：

> # How to download the user jars
> I agree with Gyula that it will be a burden if we bundle the flink
> filesystem dependencies in the operator image.
> Maybe we could have a *ArtifactFetcher* interface in the
> flink-kubernetes-operator. By default, we provide the local and http
> implementation,
> which means we could get the user jars from local files or HTTP URLs. Flink
> filesystem support could be done as a follow-up based on the feedback.
>
> If the user wants to use the local implementation, they need to mount a
> PV(aka persist volume) to the operator first and then put their jars into
> the PV.
>
> # How to talk to session JobManager to submit the job
> After more consideration, I also prefer the second approach, via REST API
> /jars/:jarid/run. If we have strong requirements to support dependencies
> jars and
> artifacts, we could try to support this in the upstream project.
>
> Best,
> Yang
>
>
> Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
>
> > Hi Gyula
> >     Thanks for your quick response. Regarding the different filesystems
> > dependency,
> > I think we can make it optional and pluggable, and let it choose by user
> > when building
> > their operator image. Users can build their image from the base operator
> > image and
> > add filesystem dependency they want to use to it. BTW, we can support the
> > http URI
> > by default.
> >
> > Thanks,
> > Aitozi.
> >
> > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> >
> > > Thank you Aitozi!
> > >
> > > I think this will be a very nice (and simple) addition to enable these
> > > use-cases.
> > >
> > > I have 2 comments regarding the proposal:
> > >
> > > 1. I think if we want to support different filesystems to download jars
> > > from, we probably need some clever ways to add external operator
> > > dependencies (jars, configs).
> > > I would prefer not to bundle them into the base operator image.
> > >
> > > 2. I think we should avoid creating the jobgraphs on the operator side
> > and
> > > use the jar upload/run rest api instead as you suggested. This will
> avoid
> > > flink version and dependency conflicts.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:
> > >
> > > > Hi Guys:
> > > >
> > > >     I would like to open a discussion for support session job
> > management
> > > in
> > > > kubernetes operator. It’s intended to enhance the
> > > flink-kubernetes-operator
> > > > to manage the session job with k8s tooling. I have drafted the design
> > > > doc[1]. Please refer to it and give me some feedback .
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#<https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit>
> > > >
> > > > Best,
> > > >
> > > > Aitozi.
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

Hi Yang Wang
    Thanks for your feedback, Provide the local and http implementation for
the first version makes sense to me.
+1 for it.

Best,
Aitozi

Yang Wang <da...@gmail.com> 于2022年3月16日周三 16:44写道：

> # How to download the user jars
> I agree with Gyula that it will be a burden if we bundle the flink
> filesystem dependencies in the operator image.
> Maybe we could have a *ArtifactFetcher* interface in the
> flink-kubernetes-operator. By default, we provide the local and http
> implementation,
> which means we could get the user jars from local files or HTTP URLs. Flink
> filesystem support could be done as a follow-up based on the feedback.
>
> If the user wants to use the local implementation, they need to mount a
> PV(aka persist volume) to the operator first and then put their jars into
> the PV.
>
> # How to talk to session JobManager to submit the job
> After more consideration, I also prefer the second approach, via REST API
> /jars/:jarid/run. If we have strong requirements to support dependencies
> jars and
> artifacts, we could try to support this in the upstream project.
>
> Best,
> Yang
>
>
> Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：
>
> > Hi Gyula
> >     Thanks for your quick response. Regarding the different filesystems
> > dependency,
> > I think we can make it optional and pluggable, and let it choose by user
> > when building
> > their operator image. Users can build their image from the base operator
> > image and
> > add filesystem dependency they want to use to it. BTW, we can support the
> > http URI
> > by default.
> >
> > Thanks,
> > Aitozi.
> >
> > Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
> >
> > > Thank you Aitozi!
> > >
> > > I think this will be a very nice (and simple) addition to enable these
> > > use-cases.
> > >
> > > I have 2 comments regarding the proposal:
> > >
> > > 1. I think if we want to support different filesystems to download jars
> > > from, we probably need some clever ways to add external operator
> > > dependencies (jars, configs).
> > > I would prefer not to bundle them into the base operator image.
> > >
> > > 2. I think we should avoid creating the jobgraphs on the operator side
> > and
> > > use the jar upload/run rest api instead as you suggested. This will
> avoid
> > > flink version and dependency conflicts.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:
> > >
> > > > Hi Guys:
> > > >
> > > >     I would like to open a discussion for support session job
> > management
> > > in
> > > > kubernetes operator. It’s intended to enhance the
> > > flink-kubernetes-operator
> > > > to manage the session job with k8s tooling. I have drafted the design
> > > > doc[1]. Please refer to it and give me some feedback .
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > > >
> > > > Best,
> > > >
> > > > Aitozi.
> > > >
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Yang Wang <da...@gmail.com>.

# How to download the user jars
I agree with Gyula that it will be a burden if we bundle the flink
filesystem dependencies in the operator image.
Maybe we could have a *ArtifactFetcher* interface in the
flink-kubernetes-operator. By default, we provide the local and http
implementation,
which means we could get the user jars from local files or HTTP URLs. Flink
filesystem support could be done as a follow-up based on the feedback.

If the user wants to use the local implementation, they need to mount a
PV(aka persist volume) to the operator first and then put their jars into
the PV.

# How to talk to session JobManager to submit the job
After more consideration, I also prefer the second approach, via REST API
/jars/:jarid/run. If we have strong requirements to support dependencies
jars and
artifacts, we could try to support this in the upstream project.

Best,
Yang

Aitozi <gj...@gmail.com> 于2022年3月16日周三 16:11写道：

> Hi Gyula
>     Thanks for your quick response. Regarding the different filesystems
> dependency,
> I think we can make it optional and pluggable, and let it choose by user
> when building
> their operator image. Users can build their image from the base operator
> image and
> add filesystem dependency they want to use to it. BTW, we can support the
> http URI
> by default.
>
> Thanks,
> Aitozi.
>
> Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：
>
> > Thank you Aitozi!
> >
> > I think this will be a very nice (and simple) addition to enable these
> > use-cases.
> >
> > I have 2 comments regarding the proposal:
> >
> > 1. I think if we want to support different filesystems to download jars
> > from, we probably need some clever ways to add external operator
> > dependencies (jars, configs).
> > I would prefer not to bundle them into the base operator image.
> >
> > 2. I think we should avoid creating the jobgraphs on the operator side
> and
> > use the jar upload/run rest api instead as you suggested. This will avoid
> > flink version and dependency conflicts.
> >
> > Cheers,
> > Gyula
> >
> > On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:
> >
> > > Hi Guys:
> > >
> > >     I would like to open a discussion for support session job
> management
> > in
> > > kubernetes operator. It’s intended to enhance the
> > flink-kubernetes-operator
> > > to manage the session job with k8s tooling. I have drafted the design
> > > doc[1]. Please refer to it and give me some feedback .
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> > >
> > > Best,
> > >
> > > Aitozi.
> > >
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Aitozi <gj...@gmail.com>.

Hi Gyula
    Thanks for your quick response. Regarding the different filesystems
dependency,
I think we can make it optional and pluggable, and let it choose by user
when building
their operator image. Users can build their image from the base operator
image and
add filesystem dependency they want to use to it. BTW, we can support the
http URI
by default.

Thanks,
Aitozi.

Gyula Fóra <gy...@gmail.com> 于2022年3月16日周三 15:53写道：

> Thank you Aitozi!
>
> I think this will be a very nice (and simple) addition to enable these
> use-cases.
>
> I have 2 comments regarding the proposal:
>
> 1. I think if we want to support different filesystems to download jars
> from, we probably need some clever ways to add external operator
> dependencies (jars, configs).
> I would prefer not to bundle them into the base operator image.
>
> 2. I think we should avoid creating the jobgraphs on the operator side and
> use the jar upload/run rest api instead as you suggested. This will avoid
> flink version and dependency conflicts.
>
> Cheers,
> Gyula
>
> On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:
>
> > Hi Guys:
> >
> >     I would like to open a discussion for support session job management
> in
> > kubernetes operator. It’s intended to enhance the
> flink-kubernetes-operator
> > to manage the session job with k8s tooling. I have drafted the design
> > doc[1]. Please refer to it and give me some feedback .
> >
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
> >
> > Best,
> >
> > Aitozi.
> >
>

Re: [DISCUSS] Support the session job management in kubernetes operator

Posted by Gyula Fóra <gy...@gmail.com>.

Thank you Aitozi!

I think this will be a very nice (and simple) addition to enable these
use-cases.

I have 2 comments regarding the proposal:

1. I think if we want to support different filesystems to download jars
from, we probably need some clever ways to add external operator
dependencies (jars, configs).
I would prefer not to bundle them into the base operator image.

2. I think we should avoid creating the jobgraphs on the operator side and
use the jar upload/run rest api instead as you suggested. This will avoid
flink version and dependency conflicts.

Cheers,
Gyula

On Wed, Mar 16, 2022 at 8:41 AM Aitozi <gj...@gmail.com> wrote:

> Hi Guys:
>
>     I would like to open a discussion for support session job management in
> kubernetes operator. It’s intended to enhance the flink-kubernetes-operator
> to manage the session job with k8s tooling. I have drafted the design
> doc[1]. Please refer to it and give me some feedback .
>
>
> [1]
>
> https://docs.google.com/document/d/1WPGbur1eT3H_5gN-kyXfp7EDjdbJUURx6jN8nt6UT-s/edit#
>
> Best,
>
> Aitozi.
>