You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "WONG, DAREN" <da...@amazon.co.uk.INVALID> on 2022/07/11 13:17:04 UTC

[DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had a quick chat with Gyula as I propose to include a few additional fields in the jobStatus CRD for Flink Kubernetes Operator such as:

- endTime
- duration
- jobPlan

Further details of each states can be found here<https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java>. Although addition of these 3 states stem from an internal requirement, I think they would be beneficial to others who uses these states in their application as well. The list of states above are not exhaustive, so do let me know if there are other states that you would like to include together in this iteration cycle.

JIRA: https://issues.apache.org/jira/browse/FLINK-28494

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Gyula Fóra <gy...@gmail.com>.

Hi Matyas!

So to clarify your suggestion, we would have the following JobStatus fields:

jobId : String
state : String
savepointInfo : SavepointInfo
jobDetailsInfo : String (optional) - output of Flink Rest API job details

And the user could configure with a flag whether to include jobDetailsInfo
or not in status.

Cheers,
Gyula

On Fri, Jul 15, 2022 at 3:02 PM Őrhidi Mátyás <ma...@gmail.com>
wrote:

> Hi Gyula,
>
> since the jobDetailsInfo could evolve, another option would be to dump it
> as yaml/json into the metadata.
>
> Best,
> Matyas
>
> On Fri, Jul 15, 2022 at 2:58 PM Gyula Fóra <gy...@gmail.com> wrote:
>
> > Based on some further though, a reasonable middleground would be to add
> an
> > optional metadata/jobDetailsInfo field to the JobStatus.
> > We would also add an accompanying config option (default false) whether
> to
> > populate this field for jobs.
> >
> > This way operator users could decide if they want to expose the job
> > information provided by Flink Rest API or only the information that the
> > operator itself needs.
> >
> > What do you all think?
> >
> > Gyula
> >
> > On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra <gy...@gmail.com> wrote:
> >
> > > Hi All!
> > >
> > > I fully acknowledge the general need to access more info about the
> > running
> > > deployments. This need however is very specific to the use-cases /
> > > platforms built on the operator.
> > > I think we need a good way to tackle this without growing the status
> > > arbitrarily.
> > >
> > > Currently the JobStatus in the operator contains the following fields:
> > >
> > >    - jobId
> > >    - state : Flink JobStatus
> > >    - savepointInfo : Operator savepoint tracking info
> > >    - startTime : Flink job startTime
> > >    - updateTime : Last time state was updated in the operator
> > >    - jobName: Name of the job
> > >
> > > Technically speaking only jobId, state and savepointInfo are used
> inside
> > > the operator logic, the rest is unnecessary and "could be removed"
> > without
> > > affecting any operator functionality.
> > >
> > > I think instead of adding more of these "unnecessary/arbitrary" fields
> we
> > > should add a more generic way that allows a configurable / pluggable
> way
> > to
> > > extend the status with user/platform specific fields based on the Flink
> > job
> > > information. At the same time we should already @Deprecate / phase out
> > the
> > > currently unnecessary fields.
> > >
> > > One way of doing this would be adding a new Map<String,String> metadata
> > > (or similar) field. And at the same time add a configurable / pluggable
> > way
> > > to create the content of this metadata based on the Flink rest api
> > response
> > > (the extended job details).
> > >
> > > What do you think?
> > > Gyula
> > >
> > > On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN
> > <da...@amazon.co.uk.invalid>
> > > wrote:
> > >
> > >> Hi Martin,
> > >>
> > >> Yes, that's understandable. I think adding job endTime, duration,
> > jobPlan
> > >> is useful to other Flink users too as they now have info to track:
> > >>
> > >> 1. endTime: If the job has ended, the user can know when it has ended.
> > If
> > >> the job is still streaming, then the user can know as it defaults to
> > "-1".
> > >> 2. duration: Info on how long the job has been running for, useful for
> > >> monitoring purposes.
> > >> 3. jobPlan: Contains more detailed job info such as the operators in
> the
> > >> job graph and the parallelism of each operator. This could benefit
> Flink
> > >> users as follows:
> > >>         3.1. Help users to get a quick view on jobs simply by querying
> > >> via k8s API, without need to integrate with Flink Client/API. Useful
> for
> > >> users who mainly use kubectl.
> > >>         3.2. Allows users to easily notice a change in job. For eg, if
> > >> user changed a job code by adding a new operator but built it with
> same
> > jar
> > >> name, then they can notice the change in jobPlan.
> > >>         3.3. User may want to operate on jobPlan difference. For eg,
> > >> create difference notification, allocate resources, or other
> automation
> > >> purposed.
> > >>
> > >> In general, I think adding these info is useful for Flink users from
> > >> simple monitoring to audit trail purposes. In addition, these info are
> > >> available via Flink REST API, hence I believe Flink users who tracks
> > these
> > >> info via API would benefit from them when they start using Flink
> > Kubernetes
> > >> Operator.
> > >>
> > >> Regards,
> > >> Daren
> > >>
> > >>
> > >> On 13/07/2022, 08:25, "Martijn Visser" <ma...@apache.org>
> > wrote:
> > >>
> > >>     CAUTION: This email originated from outside of the organization.
> Do
> > >> not click links or open attachments unless you can confirm the sender
> > and
> > >> know the content is safe.
> > >>
> > >>
> > >>
> > >>     Hi Daren,
> > >>
> > >>     Could you list the benefits for the users of Flink? I do think
> that
> > an
> > >>     internal AWS requirement is not a good argument for getting
> > something
> > >> done
> > >>     in Flink.
> > >>
> > >>     Best regards,
> > >>
> > >>     Martijn
> > >>
> > >>     Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
> > >>     <da...@amazon.co.uk.invalid>:
> > >>
> > >>     > Hi Yang,
> > >>     >
> > >>     > The requirement to add *plan* currently originates from an
> > internal
> > >> AWS
> > >>     > requirement as our service needs visibility of *plan*, but we
> > think
> > >> it
> > >>     > could be beneficial as well to customers who uses *plan* too.
> > >>     >
> > >>     > Regards,
> > >>     > Daren
> > >>     >
> > >>     >
> > >>     >
> > >>     >
> > >>     > On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com>
> wrote:
> > >>     >
> > >>     >     CAUTION: This email originated from outside of the
> > >> organization. Do
> > >>     > not click links or open attachments unless you can confirm the
> > >> sender and
> > >>     > know the content is safe.
> > >>     >
> > >>     >
> > >>     >
> > >>     >     Thanks for the explanation. Only having 1 API call in most
> > >> cases makes
> > >>     >     sense to me.
> > >>     >
> > >>     >     Could you please elaborate more about why do we need the
> > *plan*
> > >> in CR
> > >>     >     status?
> > >>     >
> > >>     >
> > >>     >     Best,
> > >>     >     Yang
> > >>     >
> > >>     >     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
> > >>     >
> > >>     >     > Hi Devs!
> > >>     >     >
> > >>     >     > I discussed with Daren offline, and I agree with him that
> > >>     > technically we
> > >>     >     > almost never need 2 API calls.
> > >>     >     >
> > >>     >     > I think it's fine to have a second API call once directly
> > >> after
> > >>     > application
> > >>     >     > submission (technically even this can be eliminated by
> > >> setting a fix
> > >>     > job id
> > >>     >     > always).
> > >>     >     >
> > >>     >     > +1 from me.
> > >>     >     >
> > >>     >     > Cheers,
> > >>     >     > Gyula
> > >>     >     >
> > >>     >     >
> > >>     >     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
> > >>     > <darenwkt@amazon.co.uk.invalid
> > >>     >     > >
> > >>     >     > wrote:
> > >>     >     >
> > >>     >     > > Hi Matyas,
> > >>     >     > >
> > >>     >     > > Thanks for the feedback, and yes I agree. An alternative
> > >> approach
> > >>     > would
> > >>     >     > > instead be:
> > >>     >     > >
> > >>     >     > > - 2 API calls only when jobID is not available (i.e when
> > >>     > submitting a new
> > >>     >     > > application cluster, which is a one-off event).
> > >>     >     > > - 1 API call when jobID is already available by directly
> > >> calling
> > >>     >     > > "/jobs/:jobid".
> > >>     >     > >
> > >>     >     > > With this approach, we can keep the API call to 1 in
> most
> > >> cases.
> > >>     >     > >
> > >>     >     > > Regards,
> > >>     >     > > Daren
> > >>     >     > >
> > >>     >     > >
> > >>     >     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <
> > >> matyas.orhidi@gmail.com>
> > >>     > wrote:
> > >>     >     > >
> > >>     >     > >     CAUTION: This email originated from outside of the
> > >>     > organization. Do
> > >>     >     > > not click links or open attachments unless you can
> confirm
> > >> the
> > >>     > sender and
> > >>     >     > > know the content is safe.
> > >>     >     > >
> > >>     >     > >
> > >>     >     > >
> > >>     >     > >     Hi Daren,
> > >>     >     > >
> > >>     >     > >     At the moment the Operator fetches the job state via
> > >>     >     > >
> > >>     >     > >
> > >>     >     >
> > >>     >
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
> > >>     >     > >     which contains the 'end-time' and 'duration' fields
> > >> already. I
> > >>     > feel
> > >>     >     > > calling
> > >>     >     > >     the
> > >>     >     > >
> > >>     >     > >
> > >>     >     >
> > >>     >
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
> > >>     >     > >     after the previous call for every job in every
> > >> reconcile loop
> > >>     > would
> > >>     >     > be
> > >>     >     > > too
> > >>     >     > >     expensive.
> > >>     >     > >
> > >>     >     > >     Best,
> > >>     >     > >     Matyas
> > >>     >     > >
> > >>     >     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
> > >>     >     > > <da...@amazon.co.uk.invalid>
> > >>     >     > >     wrote:
> > >>     >     > >
> > >>     >     > >     > Hi everyone, I am Daren from AWS Kinesis Data
> > >> Analytics
> > >>     > (KDA) team.
> > >>     >     > > I had
> > >>     >     > >     > a quick chat with Gyula as I propose to include a
> > few
> > >>     > additional
> > >>     >     > > fields in
> > >>     >     > >     > the jobStatus CRD for Flink Kubernetes Operator
> such
> > >> as:
> > >>     >     > >     >
> > >>     >     > >     > - endTime
> > >>     >     > >     > - duration
> > >>     >     > >     > - jobPlan
> > >>     >     > >     >
> > >>     >     > >     > Further details of each states can be found here<
> > >>     >     > >     >
> > >>     >     > >
> > >>     >     >
> > >>     >
> > >>
> >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> > >>     >     > > >.
> > >>     >     > >     > Although addition of these 3 states stem from an
> > >> internal
> > >>     >     > > requirement, I
> > >>     >     > >     > think they would be beneficial to others who uses
> > >> these
> > >>     > states in
> > >>     >     > > their
> > >>     >     > >     > application as well. The list of states above are
> > not
> > >>     > exhaustive,
> > >>     >     > so
> > >>     >     > > do let
> > >>     >     > >     > me know if there are other states that you would
> > like
> > >> to
> > >>     > include
> > >>     >     > > together
> > >>     >     > >     > in this iteration cycle.
> > >>     >     > >     >
> > >>     >     > >     > JIRA:
> > >> https://issues.apache.org/jira/browse/FLINK-28494
> > >>     >     > >     >
> > >>     >     > >
> > >>     >     > >
> > >>     >     >
> > >>     >
> > >>     >
> > >>
> > >>
> >
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Őrhidi Mátyás <ma...@gmail.com>.

Hi Gyula,

since the jobDetailsInfo could evolve, another option would be to dump it
as yaml/json into the metadata.

Best,
Matyas

On Fri, Jul 15, 2022 at 2:58 PM Gyula Fóra <gy...@gmail.com> wrote:

> Based on some further though, a reasonable middleground would be to add an
> optional metadata/jobDetailsInfo field to the JobStatus.
> We would also add an accompanying config option (default false) whether to
> populate this field for jobs.
>
> This way operator users could decide if they want to expose the job
> information provided by Flink Rest API or only the information that the
> operator itself needs.
>
> What do you all think?
>
> Gyula
>
> On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra <gy...@gmail.com> wrote:
>
> > Hi All!
> >
> > I fully acknowledge the general need to access more info about the
> running
> > deployments. This need however is very specific to the use-cases /
> > platforms built on the operator.
> > I think we need a good way to tackle this without growing the status
> > arbitrarily.
> >
> > Currently the JobStatus in the operator contains the following fields:
> >
> >    - jobId
> >    - state : Flink JobStatus
> >    - savepointInfo : Operator savepoint tracking info
> >    - startTime : Flink job startTime
> >    - updateTime : Last time state was updated in the operator
> >    - jobName: Name of the job
> >
> > Technically speaking only jobId, state and savepointInfo are used inside
> > the operator logic, the rest is unnecessary and "could be removed"
> without
> > affecting any operator functionality.
> >
> > I think instead of adding more of these "unnecessary/arbitrary" fields we
> > should add a more generic way that allows a configurable / pluggable way
> to
> > extend the status with user/platform specific fields based on the Flink
> job
> > information. At the same time we should already @Deprecate / phase out
> the
> > currently unnecessary fields.
> >
> > One way of doing this would be adding a new Map<String,String> metadata
> > (or similar) field. And at the same time add a configurable / pluggable
> way
> > to create the content of this metadata based on the Flink rest api
> response
> > (the extended job details).
> >
> > What do you think?
> > Gyula
> >
> > On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN
> <da...@amazon.co.uk.invalid>
> > wrote:
> >
> >> Hi Martin,
> >>
> >> Yes, that's understandable. I think adding job endTime, duration,
> jobPlan
> >> is useful to other Flink users too as they now have info to track:
> >>
> >> 1. endTime: If the job has ended, the user can know when it has ended.
> If
> >> the job is still streaming, then the user can know as it defaults to
> "-1".
> >> 2. duration: Info on how long the job has been running for, useful for
> >> monitoring purposes.
> >> 3. jobPlan: Contains more detailed job info such as the operators in the
> >> job graph and the parallelism of each operator. This could benefit Flink
> >> users as follows:
> >>         3.1. Help users to get a quick view on jobs simply by querying
> >> via k8s API, without need to integrate with Flink Client/API. Useful for
> >> users who mainly use kubectl.
> >>         3.2. Allows users to easily notice a change in job. For eg, if
> >> user changed a job code by adding a new operator but built it with same
> jar
> >> name, then they can notice the change in jobPlan.
> >>         3.3. User may want to operate on jobPlan difference. For eg,
> >> create difference notification, allocate resources, or other automation
> >> purposed.
> >>
> >> In general, I think adding these info is useful for Flink users from
> >> simple monitoring to audit trail purposes. In addition, these info are
> >> available via Flink REST API, hence I believe Flink users who tracks
> these
> >> info via API would benefit from them when they start using Flink
> Kubernetes
> >> Operator.
> >>
> >> Regards,
> >> Daren
> >>
> >>
> >> On 13/07/2022, 08:25, "Martijn Visser" <ma...@apache.org>
> wrote:
> >>
> >>     CAUTION: This email originated from outside of the organization. Do
> >> not click links or open attachments unless you can confirm the sender
> and
> >> know the content is safe.
> >>
> >>
> >>
> >>     Hi Daren,
> >>
> >>     Could you list the benefits for the users of Flink? I do think that
> an
> >>     internal AWS requirement is not a good argument for getting
> something
> >> done
> >>     in Flink.
> >>
> >>     Best regards,
> >>
> >>     Martijn
> >>
> >>     Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
> >>     <da...@amazon.co.uk.invalid>:
> >>
> >>     > Hi Yang,
> >>     >
> >>     > The requirement to add *plan* currently originates from an
> internal
> >> AWS
> >>     > requirement as our service needs visibility of *plan*, but we
> think
> >> it
> >>     > could be beneficial as well to customers who uses *plan* too.
> >>     >
> >>     > Regards,
> >>     > Daren
> >>     >
> >>     >
> >>     >
> >>     >
> >>     > On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:
> >>     >
> >>     >     CAUTION: This email originated from outside of the
> >> organization. Do
> >>     > not click links or open attachments unless you can confirm the
> >> sender and
> >>     > know the content is safe.
> >>     >
> >>     >
> >>     >
> >>     >     Thanks for the explanation. Only having 1 API call in most
> >> cases makes
> >>     >     sense to me.
> >>     >
> >>     >     Could you please elaborate more about why do we need the
> *plan*
> >> in CR
> >>     >     status?
> >>     >
> >>     >
> >>     >     Best,
> >>     >     Yang
> >>     >
> >>     >     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
> >>     >
> >>     >     > Hi Devs!
> >>     >     >
> >>     >     > I discussed with Daren offline, and I agree with him that
> >>     > technically we
> >>     >     > almost never need 2 API calls.
> >>     >     >
> >>     >     > I think it's fine to have a second API call once directly
> >> after
> >>     > application
> >>     >     > submission (technically even this can be eliminated by
> >> setting a fix
> >>     > job id
> >>     >     > always).
> >>     >     >
> >>     >     > +1 from me.
> >>     >     >
> >>     >     > Cheers,
> >>     >     > Gyula
> >>     >     >
> >>     >     >
> >>     >     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
> >>     > <darenwkt@amazon.co.uk.invalid
> >>     >     > >
> >>     >     > wrote:
> >>     >     >
> >>     >     > > Hi Matyas,
> >>     >     > >
> >>     >     > > Thanks for the feedback, and yes I agree. An alternative
> >> approach
> >>     > would
> >>     >     > > instead be:
> >>     >     > >
> >>     >     > > - 2 API calls only when jobID is not available (i.e when
> >>     > submitting a new
> >>     >     > > application cluster, which is a one-off event).
> >>     >     > > - 1 API call when jobID is already available by directly
> >> calling
> >>     >     > > "/jobs/:jobid".
> >>     >     > >
> >>     >     > > With this approach, we can keep the API call to 1 in most
> >> cases.
> >>     >     > >
> >>     >     > > Regards,
> >>     >     > > Daren
> >>     >     > >
> >>     >     > >
> >>     >     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <
> >> matyas.orhidi@gmail.com>
> >>     > wrote:
> >>     >     > >
> >>     >     > >     CAUTION: This email originated from outside of the
> >>     > organization. Do
> >>     >     > > not click links or open attachments unless you can confirm
> >> the
> >>     > sender and
> >>     >     > > know the content is safe.
> >>     >     > >
> >>     >     > >
> >>     >     > >
> >>     >     > >     Hi Daren,
> >>     >     > >
> >>     >     > >     At the moment the Operator fetches the job state via
> >>     >     > >
> >>     >     > >
> >>     >     >
> >>     >
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
> >>     >     > >     which contains the 'end-time' and 'duration' fields
> >> already. I
> >>     > feel
> >>     >     > > calling
> >>     >     > >     the
> >>     >     > >
> >>     >     > >
> >>     >     >
> >>     >
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
> >>     >     > >     after the previous call for every job in every
> >> reconcile loop
> >>     > would
> >>     >     > be
> >>     >     > > too
> >>     >     > >     expensive.
> >>     >     > >
> >>     >     > >     Best,
> >>     >     > >     Matyas
> >>     >     > >
> >>     >     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
> >>     >     > > <da...@amazon.co.uk.invalid>
> >>     >     > >     wrote:
> >>     >     > >
> >>     >     > >     > Hi everyone, I am Daren from AWS Kinesis Data
> >> Analytics
> >>     > (KDA) team.
> >>     >     > > I had
> >>     >     > >     > a quick chat with Gyula as I propose to include a
> few
> >>     > additional
> >>     >     > > fields in
> >>     >     > >     > the jobStatus CRD for Flink Kubernetes Operator such
> >> as:
> >>     >     > >     >
> >>     >     > >     > - endTime
> >>     >     > >     > - duration
> >>     >     > >     > - jobPlan
> >>     >     > >     >
> >>     >     > >     > Further details of each states can be found here<
> >>     >     > >     >
> >>     >     > >
> >>     >     >
> >>     >
> >>
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> >>     >     > > >.
> >>     >     > >     > Although addition of these 3 states stem from an
> >> internal
> >>     >     > > requirement, I
> >>     >     > >     > think they would be beneficial to others who uses
> >> these
> >>     > states in
> >>     >     > > their
> >>     >     > >     > application as well. The list of states above are
> not
> >>     > exhaustive,
> >>     >     > so
> >>     >     > > do let
> >>     >     > >     > me know if there are other states that you would
> like
> >> to
> >>     > include
> >>     >     > > together
> >>     >     > >     > in this iteration cycle.
> >>     >     > >     >
> >>     >     > >     > JIRA:
> >> https://issues.apache.org/jira/browse/FLINK-28494
> >>     >     > >     >
> >>     >     > >
> >>     >     > >
> >>     >     >
> >>     >
> >>     >
> >>
> >>
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Gyula Fóra <gy...@gmail.com>.

Based on some further though, a reasonable middleground would be to add an
optional metadata/jobDetailsInfo field to the JobStatus.
We would also add an accompanying config option (default false) whether to
populate this field for jobs.

This way operator users could decide if they want to expose the job
information provided by Flink Rest API or only the information that the
operator itself needs.

What do you all think?

Gyula

On Fri, Jul 15, 2022 at 2:09 PM Gyula Fóra <gy...@gmail.com> wrote:

> Hi All!
>
> I fully acknowledge the general need to access more info about the running
> deployments. This need however is very specific to the use-cases /
> platforms built on the operator.
> I think we need a good way to tackle this without growing the status
> arbitrarily.
>
> Currently the JobStatus in the operator contains the following fields:
>
>    - jobId
>    - state : Flink JobStatus
>    - savepointInfo : Operator savepoint tracking info
>    - startTime : Flink job startTime
>    - updateTime : Last time state was updated in the operator
>    - jobName: Name of the job
>
> Technically speaking only jobId, state and savepointInfo are used inside
> the operator logic, the rest is unnecessary and "could be removed" without
> affecting any operator functionality.
>
> I think instead of adding more of these "unnecessary/arbitrary" fields we
> should add a more generic way that allows a configurable / pluggable way to
> extend the status with user/platform specific fields based on the Flink job
> information. At the same time we should already @Deprecate / phase out the
> currently unnecessary fields.
>
> One way of doing this would be adding a new Map<String,String> metadata
> (or similar) field. And at the same time add a configurable / pluggable way
> to create the content of this metadata based on the Flink rest api response
> (the extended job details).
>
> What do you think?
> Gyula
>
> On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN <da...@amazon.co.uk.invalid>
> wrote:
>
>> Hi Martin,
>>
>> Yes, that's understandable. I think adding job endTime, duration, jobPlan
>> is useful to other Flink users too as they now have info to track:
>>
>> 1. endTime: If the job has ended, the user can know when it has ended. If
>> the job is still streaming, then the user can know as it defaults to "-1".
>> 2. duration: Info on how long the job has been running for, useful for
>> monitoring purposes.
>> 3. jobPlan: Contains more detailed job info such as the operators in the
>> job graph and the parallelism of each operator. This could benefit Flink
>> users as follows:
>>         3.1. Help users to get a quick view on jobs simply by querying
>> via k8s API, without need to integrate with Flink Client/API. Useful for
>> users who mainly use kubectl.
>>         3.2. Allows users to easily notice a change in job. For eg, if
>> user changed a job code by adding a new operator but built it with same jar
>> name, then they can notice the change in jobPlan.
>>         3.3. User may want to operate on jobPlan difference. For eg,
>> create difference notification, allocate resources, or other automation
>> purposed.
>>
>> In general, I think adding these info is useful for Flink users from
>> simple monitoring to audit trail purposes. In addition, these info are
>> available via Flink REST API, hence I believe Flink users who tracks these
>> info via API would benefit from them when they start using Flink Kubernetes
>> Operator.
>>
>> Regards,
>> Daren
>>
>>
>> On 13/07/2022, 08:25, "Martijn Visser" <ma...@apache.org> wrote:
>>
>>     CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>>
>>
>>     Hi Daren,
>>
>>     Could you list the benefits for the users of Flink? I do think that an
>>     internal AWS requirement is not a good argument for getting something
>> done
>>     in Flink.
>>
>>     Best regards,
>>
>>     Martijn
>>
>>     Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
>>     <da...@amazon.co.uk.invalid>:
>>
>>     > Hi Yang,
>>     >
>>     > The requirement to add *plan* currently originates from an internal
>> AWS
>>     > requirement as our service needs visibility of *plan*, but we think
>> it
>>     > could be beneficial as well to customers who uses *plan* too.
>>     >
>>     > Regards,
>>     > Daren
>>     >
>>     >
>>     >
>>     >
>>     > On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:
>>     >
>>     >     CAUTION: This email originated from outside of the
>> organization. Do
>>     > not click links or open attachments unless you can confirm the
>> sender and
>>     > know the content is safe.
>>     >
>>     >
>>     >
>>     >     Thanks for the explanation. Only having 1 API call in most
>> cases makes
>>     >     sense to me.
>>     >
>>     >     Could you please elaborate more about why do we need the *plan*
>> in CR
>>     >     status?
>>     >
>>     >
>>     >     Best,
>>     >     Yang
>>     >
>>     >     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
>>     >
>>     >     > Hi Devs!
>>     >     >
>>     >     > I discussed with Daren offline, and I agree with him that
>>     > technically we
>>     >     > almost never need 2 API calls.
>>     >     >
>>     >     > I think it's fine to have a second API call once directly
>> after
>>     > application
>>     >     > submission (technically even this can be eliminated by
>> setting a fix
>>     > job id
>>     >     > always).
>>     >     >
>>     >     > +1 from me.
>>     >     >
>>     >     > Cheers,
>>     >     > Gyula
>>     >     >
>>     >     >
>>     >     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
>>     > <darenwkt@amazon.co.uk.invalid
>>     >     > >
>>     >     > wrote:
>>     >     >
>>     >     > > Hi Matyas,
>>     >     > >
>>     >     > > Thanks for the feedback, and yes I agree. An alternative
>> approach
>>     > would
>>     >     > > instead be:
>>     >     > >
>>     >     > > - 2 API calls only when jobID is not available (i.e when
>>     > submitting a new
>>     >     > > application cluster, which is a one-off event).
>>     >     > > - 1 API call when jobID is already available by directly
>> calling
>>     >     > > "/jobs/:jobid".
>>     >     > >
>>     >     > > With this approach, we can keep the API call to 1 in most
>> cases.
>>     >     > >
>>     >     > > Regards,
>>     >     > > Daren
>>     >     > >
>>     >     > >
>>     >     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <
>> matyas.orhidi@gmail.com>
>>     > wrote:
>>     >     > >
>>     >     > >     CAUTION: This email originated from outside of the
>>     > organization. Do
>>     >     > > not click links or open attachments unless you can confirm
>> the
>>     > sender and
>>     >     > > know the content is safe.
>>     >     > >
>>     >     > >
>>     >     > >
>>     >     > >     Hi Daren,
>>     >     > >
>>     >     > >     At the moment the Operator fetches the job state via
>>     >     > >
>>     >     > >
>>     >     >
>>     >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
>>     >     > >     which contains the 'end-time' and 'duration' fields
>> already. I
>>     > feel
>>     >     > > calling
>>     >     > >     the
>>     >     > >
>>     >     > >
>>     >     >
>>     >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
>>     >     > >     after the previous call for every job in every
>> reconcile loop
>>     > would
>>     >     > be
>>     >     > > too
>>     >     > >     expensive.
>>     >     > >
>>     >     > >     Best,
>>     >     > >     Matyas
>>     >     > >
>>     >     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
>>     >     > > <da...@amazon.co.uk.invalid>
>>     >     > >     wrote:
>>     >     > >
>>     >     > >     > Hi everyone, I am Daren from AWS Kinesis Data
>> Analytics
>>     > (KDA) team.
>>     >     > > I had
>>     >     > >     > a quick chat with Gyula as I propose to include a few
>>     > additional
>>     >     > > fields in
>>     >     > >     > the jobStatus CRD for Flink Kubernetes Operator such
>> as:
>>     >     > >     >
>>     >     > >     > - endTime
>>     >     > >     > - duration
>>     >     > >     > - jobPlan
>>     >     > >     >
>>     >     > >     > Further details of each states can be found here<
>>     >     > >     >
>>     >     > >
>>     >     >
>>     >
>> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
>>     >     > > >.
>>     >     > >     > Although addition of these 3 states stem from an
>> internal
>>     >     > > requirement, I
>>     >     > >     > think they would be beneficial to others who uses
>> these
>>     > states in
>>     >     > > their
>>     >     > >     > application as well. The list of states above are not
>>     > exhaustive,
>>     >     > so
>>     >     > > do let
>>     >     > >     > me know if there are other states that you would like
>> to
>>     > include
>>     >     > > together
>>     >     > >     > in this iteration cycle.
>>     >     > >     >
>>     >     > >     > JIRA:
>> https://issues.apache.org/jira/browse/FLINK-28494
>>     >     > >     >
>>     >     > >
>>     >     > >
>>     >     >
>>     >
>>     >
>>
>>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Gyula Fóra <gy...@gmail.com>.

Hi All!

I fully acknowledge the general need to access more info about the running
deployments. This need however is very specific to the use-cases /
platforms built on the operator.
I think we need a good way to tackle this without growing the status
arbitrarily.

Currently the JobStatus in the operator contains the following fields:

   - jobId
   - state : Flink JobStatus
   - savepointInfo : Operator savepoint tracking info
   - startTime : Flink job startTime
   - updateTime : Last time state was updated in the operator
   - jobName: Name of the job

Technically speaking only jobId, state and savepointInfo are used inside
the operator logic, the rest is unnecessary and "could be removed" without
affecting any operator functionality.

I think instead of adding more of these "unnecessary/arbitrary" fields we
should add a more generic way that allows a configurable / pluggable way to
extend the status with user/platform specific fields based on the Flink job
information. At the same time we should already @Deprecate / phase out the
currently unnecessary fields.

One way of doing this would be adding a new Map<String,String> metadata (or
similar) field. And at the same time add a configurable / pluggable way to
create the content of this metadata based on the Flink rest api response
(the extended job details).

What do you think?
Gyula

On Fri, Jul 15, 2022 at 1:05 PM WONG, DAREN <da...@amazon.co.uk.invalid>
wrote:

> Hi Martin,
>
> Yes, that's understandable. I think adding job endTime, duration, jobPlan
> is useful to other Flink users too as they now have info to track:
>
> 1. endTime: If the job has ended, the user can know when it has ended. If
> the job is still streaming, then the user can know as it defaults to "-1".
> 2. duration: Info on how long the job has been running for, useful for
> monitoring purposes.
> 3. jobPlan: Contains more detailed job info such as the operators in the
> job graph and the parallelism of each operator. This could benefit Flink
> users as follows:
>         3.1. Help users to get a quick view on jobs simply by querying via
> k8s API, without need to integrate with Flink Client/API. Useful for users
> who mainly use kubectl.
>         3.2. Allows users to easily notice a change in job. For eg, if
> user changed a job code by adding a new operator but built it with same jar
> name, then they can notice the change in jobPlan.
>         3.3. User may want to operate on jobPlan difference. For eg,
> create difference notification, allocate resources, or other automation
> purposed.
>
> In general, I think adding these info is useful for Flink users from
> simple monitoring to audit trail purposes. In addition, these info are
> available via Flink REST API, hence I believe Flink users who tracks these
> info via API would benefit from them when they start using Flink Kubernetes
> Operator.
>
> Regards,
> Daren
>
>
> On 13/07/2022, 08:25, "Martijn Visser" <ma...@apache.org> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Hi Daren,
>
>     Could you list the benefits for the users of Flink? I do think that an
>     internal AWS requirement is not a good argument for getting something
> done
>     in Flink.
>
>     Best regards,
>
>     Martijn
>
>     Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
>     <da...@amazon.co.uk.invalid>:
>
>     > Hi Yang,
>     >
>     > The requirement to add *plan* currently originates from an internal
> AWS
>     > requirement as our service needs visibility of *plan*, but we think
> it
>     > could be beneficial as well to customers who uses *plan* too.
>     >
>     > Regards,
>     > Daren
>     >
>     >
>     >
>     >
>     > On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:
>     >
>     >     CAUTION: This email originated from outside of the organization.
> Do
>     > not click links or open attachments unless you can confirm the
> sender and
>     > know the content is safe.
>     >
>     >
>     >
>     >     Thanks for the explanation. Only having 1 API call in most cases
> makes
>     >     sense to me.
>     >
>     >     Could you please elaborate more about why do we need the *plan*
> in CR
>     >     status?
>     >
>     >
>     >     Best,
>     >     Yang
>     >
>     >     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
>     >
>     >     > Hi Devs!
>     >     >
>     >     > I discussed with Daren offline, and I agree with him that
>     > technically we
>     >     > almost never need 2 API calls.
>     >     >
>     >     > I think it's fine to have a second API call once directly after
>     > application
>     >     > submission (technically even this can be eliminated by setting
> a fix
>     > job id
>     >     > always).
>     >     >
>     >     > +1 from me.
>     >     >
>     >     > Cheers,
>     >     > Gyula
>     >     >
>     >     >
>     >     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
>     > <darenwkt@amazon.co.uk.invalid
>     >     > >
>     >     > wrote:
>     >     >
>     >     > > Hi Matyas,
>     >     > >
>     >     > > Thanks for the feedback, and yes I agree. An alternative
> approach
>     > would
>     >     > > instead be:
>     >     > >
>     >     > > - 2 API calls only when jobID is not available (i.e when
>     > submitting a new
>     >     > > application cluster, which is a one-off event).
>     >     > > - 1 API call when jobID is already available by directly
> calling
>     >     > > "/jobs/:jobid".
>     >     > >
>     >     > > With this approach, we can keep the API call to 1 in most
> cases.
>     >     > >
>     >     > > Regards,
>     >     > > Daren
>     >     > >
>     >     > >
>     >     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <
> matyas.orhidi@gmail.com>
>     > wrote:
>     >     > >
>     >     > >     CAUTION: This email originated from outside of the
>     > organization. Do
>     >     > > not click links or open attachments unless you can confirm
> the
>     > sender and
>     >     > > know the content is safe.
>     >     > >
>     >     > >
>     >     > >
>     >     > >     Hi Daren,
>     >     > >
>     >     > >     At the moment the Operator fetches the job state via
>     >     > >
>     >     > >
>     >     >
>     >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
>     >     > >     which contains the 'end-time' and 'duration' fields
> already. I
>     > feel
>     >     > > calling
>     >     > >     the
>     >     > >
>     >     > >
>     >     >
>     >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
>     >     > >     after the previous call for every job in every reconcile
> loop
>     > would
>     >     > be
>     >     > > too
>     >     > >     expensive.
>     >     > >
>     >     > >     Best,
>     >     > >     Matyas
>     >     > >
>     >     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
>     >     > > <da...@amazon.co.uk.invalid>
>     >     > >     wrote:
>     >     > >
>     >     > >     > Hi everyone, I am Daren from AWS Kinesis Data Analytics
>     > (KDA) team.
>     >     > > I had
>     >     > >     > a quick chat with Gyula as I propose to include a few
>     > additional
>     >     > > fields in
>     >     > >     > the jobStatus CRD for Flink Kubernetes Operator such
> as:
>     >     > >     >
>     >     > >     > - endTime
>     >     > >     > - duration
>     >     > >     > - jobPlan
>     >     > >     >
>     >     > >     > Further details of each states can be found here<
>     >     > >     >
>     >     > >
>     >     >
>     >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
>     >     > > >.
>     >     > >     > Although addition of these 3 states stem from an
> internal
>     >     > > requirement, I
>     >     > >     > think they would be beneficial to others who uses these
>     > states in
>     >     > > their
>     >     > >     > application as well. The list of states above are not
>     > exhaustive,
>     >     > so
>     >     > > do let
>     >     > >     > me know if there are other states that you would like
> to
>     > include
>     >     > > together
>     >     > >     > in this iteration cycle.
>     >     > >     >
>     >     > >     > JIRA:
> https://issues.apache.org/jira/browse/FLINK-28494
>     >     > >     >
>     >     > >
>     >     > >
>     >     >
>     >
>     >
>
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by "WONG, DAREN" <da...@amazon.co.uk.INVALID>.

Hi Martin,

Yes, that's understandable. I think adding job endTime, duration, jobPlan is useful to other Flink users too as they now have info to track:

1. endTime: If the job has ended, the user can know when it has ended. If the job is still streaming, then the user can know as it defaults to "-1". 
2. duration: Info on how long the job has been running for, useful for monitoring purposes.
3. jobPlan: Contains more detailed job info such as the operators in the job graph and the parallelism of each operator. This could benefit Flink users as follows:
	3.1. Help users to get a quick view on jobs simply by querying via k8s API, without need to integrate with Flink Client/API. Useful for users who mainly use kubectl.
	3.2. Allows users to easily notice a change in job. For eg, if user changed a job code by adding a new operator but built it with same jar name, then they can notice the change in jobPlan.
	3.3. User may want to operate on jobPlan difference. For eg, create difference notification, allocate resources, or other automation purposed.

In general, I think adding these info is useful for Flink users from simple monitoring to audit trail purposes. In addition, these info are available via Flink REST API, hence I believe Flink users who tracks these info via API would benefit from them when they start using Flink Kubernetes Operator. 

Regards,
Daren


On 13/07/2022, 08:25, "Martijn Visser" <ma...@apache.org> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hi Daren,

    Could you list the benefits for the users of Flink? I do think that an
    internal AWS requirement is not a good argument for getting something done
    in Flink.

    Best regards,

    Martijn

    Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
    <da...@amazon.co.uk.invalid>:

    > Hi Yang,
    >
    > The requirement to add *plan* currently originates from an internal AWS
    > requirement as our service needs visibility of *plan*, but we think it
    > could be beneficial as well to customers who uses *plan* too.
    >
    > Regards,
    > Daren
    >
    >
    >
    >
    > On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:
    >
    >     CAUTION: This email originated from outside of the organization. Do
    > not click links or open attachments unless you can confirm the sender and
    > know the content is safe.
    >
    >
    >
    >     Thanks for the explanation. Only having 1 API call in most cases makes
    >     sense to me.
    >
    >     Could you please elaborate more about why do we need the *plan* in CR
    >     status?
    >
    >
    >     Best,
    >     Yang
    >
    >     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
    >
    >     > Hi Devs!
    >     >
    >     > I discussed with Daren offline, and I agree with him that
    > technically we
    >     > almost never need 2 API calls.
    >     >
    >     > I think it's fine to have a second API call once directly after
    > application
    >     > submission (technically even this can be eliminated by setting a fix
    > job id
    >     > always).
    >     >
    >     > +1 from me.
    >     >
    >     > Cheers,
    >     > Gyula
    >     >
    >     >
    >     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
    > <darenwkt@amazon.co.uk.invalid
    >     > >
    >     > wrote:
    >     >
    >     > > Hi Matyas,
    >     > >
    >     > > Thanks for the feedback, and yes I agree. An alternative approach
    > would
    >     > > instead be:
    >     > >
    >     > > - 2 API calls only when jobID is not available (i.e when
    > submitting a new
    >     > > application cluster, which is a one-off event).
    >     > > - 1 API call when jobID is already available by directly calling
    >     > > "/jobs/:jobid".
    >     > >
    >     > > With this approach, we can keep the API call to 1 in most cases.
    >     > >
    >     > > Regards,
    >     > > Daren
    >     > >
    >     > >
    >     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com>
    > wrote:
    >     > >
    >     > >     CAUTION: This email originated from outside of the
    > organization. Do
    >     > > not click links or open attachments unless you can confirm the
    > sender and
    >     > > know the content is safe.
    >     > >
    >     > >
    >     > >
    >     > >     Hi Daren,
    >     > >
    >     > >     At the moment the Operator fetches the job state via
    >     > >
    >     > >
    >     >
    > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
    >     > >     which contains the 'end-time' and 'duration' fields already. I
    > feel
    >     > > calling
    >     > >     the
    >     > >
    >     > >
    >     >
    > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
    >     > >     after the previous call for every job in every reconcile loop
    > would
    >     > be
    >     > > too
    >     > >     expensive.
    >     > >
    >     > >     Best,
    >     > >     Matyas
    >     > >
    >     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
    >     > > <da...@amazon.co.uk.invalid>
    >     > >     wrote:
    >     > >
    >     > >     > Hi everyone, I am Daren from AWS Kinesis Data Analytics
    > (KDA) team.
    >     > > I had
    >     > >     > a quick chat with Gyula as I propose to include a few
    > additional
    >     > > fields in
    >     > >     > the jobStatus CRD for Flink Kubernetes Operator such as:
    >     > >     >
    >     > >     > - endTime
    >     > >     > - duration
    >     > >     > - jobPlan
    >     > >     >
    >     > >     > Further details of each states can be found here<
    >     > >     >
    >     > >
    >     >
    > https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
    >     > > >.
    >     > >     > Although addition of these 3 states stem from an internal
    >     > > requirement, I
    >     > >     > think they would be beneficial to others who uses these
    > states in
    >     > > their
    >     > >     > application as well. The list of states above are not
    > exhaustive,
    >     > so
    >     > > do let
    >     > >     > me know if there are other states that you would like to
    > include
    >     > > together
    >     > >     > in this iteration cycle.
    >     > >     >
    >     > >     > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
    >     > >     >
    >     > >
    >     > >
    >     >
    >
    >

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Martijn Visser <ma...@apache.org>.

Hi Daren,

Could you list the benefits for the users of Flink? I do think that an
internal AWS requirement is not a good argument for getting something done
in Flink.

Best regards,

Martijn

Op di 12 jul. 2022 om 21:17 schreef WONG, DAREN
<da...@amazon.co.uk.invalid>:

> Hi Yang,
>
> The requirement to add *plan* currently originates from an internal AWS
> requirement as our service needs visibility of *plan*, but we think it
> could be beneficial as well to customers who uses *plan* too.
>
> Regards,
> Daren
>
>
>
>
> On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Thanks for the explanation. Only having 1 API call in most cases makes
>     sense to me.
>
>     Could you please elaborate more about why do we need the *plan* in CR
>     status?
>
>
>     Best,
>     Yang
>
>     Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：
>
>     > Hi Devs!
>     >
>     > I discussed with Daren offline, and I agree with him that
> technically we
>     > almost never need 2 API calls.
>     >
>     > I think it's fine to have a second API call once directly after
> application
>     > submission (technically even this can be eliminated by setting a fix
> job id
>     > always).
>     >
>     > +1 from me.
>     >
>     > Cheers,
>     > Gyula
>     >
>     >
>     > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN
> <darenwkt@amazon.co.uk.invalid
>     > >
>     > wrote:
>     >
>     > > Hi Matyas,
>     > >
>     > > Thanks for the feedback, and yes I agree. An alternative approach
> would
>     > > instead be:
>     > >
>     > > - 2 API calls only when jobID is not available (i.e when
> submitting a new
>     > > application cluster, which is a one-off event).
>     > > - 1 API call when jobID is already available by directly calling
>     > > "/jobs/:jobid".
>     > >
>     > > With this approach, we can keep the API call to 1 in most cases.
>     > >
>     > > Regards,
>     > > Daren
>     > >
>     > >
>     > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com>
> wrote:
>     > >
>     > >     CAUTION: This email originated from outside of the
> organization. Do
>     > > not click links or open attachments unless you can confirm the
> sender and
>     > > know the content is safe.
>     > >
>     > >
>     > >
>     > >     Hi Daren,
>     > >
>     > >     At the moment the Operator fetches the job state via
>     > >
>     > >
>     >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
>     > >     which contains the 'end-time' and 'duration' fields already. I
> feel
>     > > calling
>     > >     the
>     > >
>     > >
>     >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
>     > >     after the previous call for every job in every reconcile loop
> would
>     > be
>     > > too
>     > >     expensive.
>     > >
>     > >     Best,
>     > >     Matyas
>     > >
>     > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
>     > > <da...@amazon.co.uk.invalid>
>     > >     wrote:
>     > >
>     > >     > Hi everyone, I am Daren from AWS Kinesis Data Analytics
> (KDA) team.
>     > > I had
>     > >     > a quick chat with Gyula as I propose to include a few
> additional
>     > > fields in
>     > >     > the jobStatus CRD for Flink Kubernetes Operator such as:
>     > >     >
>     > >     > - endTime
>     > >     > - duration
>     > >     > - jobPlan
>     > >     >
>     > >     > Further details of each states can be found here<
>     > >     >
>     > >
>     >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
>     > > >.
>     > >     > Although addition of these 3 states stem from an internal
>     > > requirement, I
>     > >     > think they would be beneficial to others who uses these
> states in
>     > > their
>     > >     > application as well. The list of states above are not
> exhaustive,
>     > so
>     > > do let
>     > >     > me know if there are other states that you would like to
> include
>     > > together
>     > >     > in this iteration cycle.
>     > >     >
>     > >     > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
>     > >     >
>     > >
>     > >
>     >
>
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by "WONG, DAREN" <da...@amazon.co.uk.INVALID>.

Hi Yang,

The requirement to add *plan* currently originates from an internal AWS requirement as our service needs visibility of *plan*, but we think it could be beneficial as well to customers who uses *plan* too.

Regards,
Daren

 


On 12/07/2022, 13:23, "Yang Wang" <da...@gmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Thanks for the explanation. Only having 1 API call in most cases makes
    sense to me.

    Could you please elaborate more about why do we need the *plan* in CR
    status?


    Best,
    Yang

    Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：

    > Hi Devs!
    >
    > I discussed with Daren offline, and I agree with him that technically we
    > almost never need 2 API calls.
    >
    > I think it's fine to have a second API call once directly after application
    > submission (technically even this can be eliminated by setting a fix job id
    > always).
    >
    > +1 from me.
    >
    > Cheers,
    > Gyula
    >
    >
    > On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN <darenwkt@amazon.co.uk.invalid
    > >
    > wrote:
    >
    > > Hi Matyas,
    > >
    > > Thanks for the feedback, and yes I agree. An alternative approach would
    > > instead be:
    > >
    > > - 2 API calls only when jobID is not available (i.e when submitting a new
    > > application cluster, which is a one-off event).
    > > - 1 API call when jobID is already available by directly calling
    > > "/jobs/:jobid".
    > >
    > > With this approach, we can keep the API call to 1 in most cases.
    > >
    > > Regards,
    > > Daren
    > >
    > >
    > > On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com> wrote:
    > >
    > >     CAUTION: This email originated from outside of the organization. Do
    > > not click links or open attachments unless you can confirm the sender and
    > > know the content is safe.
    > >
    > >
    > >
    > >     Hi Daren,
    > >
    > >     At the moment the Operator fetches the job state via
    > >
    > >
    > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
    > >     which contains the 'end-time' and 'duration' fields already. I feel
    > > calling
    > >     the
    > >
    > >
    > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
    > >     after the previous call for every job in every reconcile loop would
    > be
    > > too
    > >     expensive.
    > >
    > >     Best,
    > >     Matyas
    > >
    > >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
    > > <da...@amazon.co.uk.invalid>
    > >     wrote:
    > >
    > >     > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team.
    > > I had
    > >     > a quick chat with Gyula as I propose to include a few additional
    > > fields in
    > >     > the jobStatus CRD for Flink Kubernetes Operator such as:
    > >     >
    > >     > - endTime
    > >     > - duration
    > >     > - jobPlan
    > >     >
    > >     > Further details of each states can be found here<
    > >     >
    > >
    > https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
    > > >.
    > >     > Although addition of these 3 states stem from an internal
    > > requirement, I
    > >     > think they would be beneficial to others who uses these states in
    > > their
    > >     > application as well. The list of states above are not exhaustive,
    > so
    > > do let
    > >     > me know if there are other states that you would like to include
    > > together
    > >     > in this iteration cycle.
    > >     >
    > >     > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
    > >     >
    > >
    > >
    >

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Yang Wang <da...@gmail.com>.

Thanks for the explanation. Only having 1 API call in most cases makes
sense to me.

Could you please elaborate more about why do we need the *plan* in CR
status?


Best,
Yang

Gyula Fóra <gy...@gmail.com> 于2022年7月12日周二 17:36写道：

> Hi Devs!
>
> I discussed with Daren offline, and I agree with him that technically we
> almost never need 2 API calls.
>
> I think it's fine to have a second API call once directly after application
> submission (technically even this can be eliminated by setting a fix job id
> always).
>
> +1 from me.
>
> Cheers,
> Gyula
>
>
> On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN <darenwkt@amazon.co.uk.invalid
> >
> wrote:
>
> > Hi Matyas,
> >
> > Thanks for the feedback, and yes I agree. An alternative approach would
> > instead be:
> >
> > - 2 API calls only when jobID is not available (i.e when submitting a new
> > application cluster, which is a one-off event).
> > - 1 API call when jobID is already available by directly calling
> > "/jobs/:jobid".
> >
> > With this approach, we can keep the API call to 1 in most cases.
> >
> > Regards,
> > Daren
> >
> >
> > On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com> wrote:
> >
> >     CAUTION: This email originated from outside of the organization. Do
> > not click links or open attachments unless you can confirm the sender and
> > know the content is safe.
> >
> >
> >
> >     Hi Daren,
> >
> >     At the moment the Operator fetches the job state via
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
> >     which contains the 'end-time' and 'duration' fields already. I feel
> > calling
> >     the
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
> >     after the previous call for every job in every reconcile loop would
> be
> > too
> >     expensive.
> >
> >     Best,
> >     Matyas
> >
> >     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
> > <da...@amazon.co.uk.invalid>
> >     wrote:
> >
> >     > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team.
> > I had
> >     > a quick chat with Gyula as I propose to include a few additional
> > fields in
> >     > the jobStatus CRD for Flink Kubernetes Operator such as:
> >     >
> >     > - endTime
> >     > - duration
> >     > - jobPlan
> >     >
> >     > Further details of each states can be found here<
> >     >
> >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> > >.
> >     > Although addition of these 3 states stem from an internal
> > requirement, I
> >     > think they would be beneficial to others who uses these states in
> > their
> >     > application as well. The list of states above are not exhaustive,
> so
> > do let
> >     > me know if there are other states that you would like to include
> > together
> >     > in this iteration cycle.
> >     >
> >     > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
> >     >
> >
> >
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Gyula Fóra <gy...@gmail.com>.

Hi Devs!

I discussed with Daren offline, and I agree with him that technically we
almost never need 2 API calls.

I think it's fine to have a second API call once directly after application
submission (technically even this can be eliminated by setting a fix job id
always).

+1 from me.

Cheers,
Gyula


On Tue, Jul 12, 2022 at 11:32 AM WONG, DAREN <da...@amazon.co.uk.invalid>
wrote:

> Hi Matyas,
>
> Thanks for the feedback, and yes I agree. An alternative approach would
> instead be:
>
> - 2 API calls only when jobID is not available (i.e when submitting a new
> application cluster, which is a one-off event).
> - 1 API call when jobID is already available by directly calling
> "/jobs/:jobid".
>
> With this approach, we can keep the API call to 1 in most cases.
>
> Regards,
> Daren
>
>
> On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Hi Daren,
>
>     At the moment the Operator fetches the job state via
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
>     which contains the 'end-time' and 'duration' fields already. I feel
> calling
>     the
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
>     after the previous call for every job in every reconcile loop would be
> too
>     expensive.
>
>     Best,
>     Matyas
>
>     On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN
> <da...@amazon.co.uk.invalid>
>     wrote:
>
>     > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team.
> I had
>     > a quick chat with Gyula as I propose to include a few additional
> fields in
>     > the jobStatus CRD for Flink Kubernetes Operator such as:
>     >
>     > - endTime
>     > - duration
>     > - jobPlan
>     >
>     > Further details of each states can be found here<
>     >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> >.
>     > Although addition of these 3 states stem from an internal
> requirement, I
>     > think they would be beneficial to others who uses these states in
> their
>     > application as well. The list of states above are not exhaustive, so
> do let
>     > me know if there are other states that you would like to include
> together
>     > in this iteration cycle.
>     >
>     > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
>     >
>
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by "WONG, DAREN" <da...@amazon.co.uk.INVALID>.

Hi Matyas,

Thanks for the feedback, and yes I agree. An alternative approach would instead be:

- 2 API calls only when jobID is not available (i.e when submitting a new application cluster, which is a one-off event).
- 1 API call when jobID is already available by directly calling "/jobs/:jobid".

With this approach, we can keep the API call to 1 in most cases.

Regards,
Daren


On 11/07/2022, 14:44, "Őrhidi Mátyás" <ma...@gmail.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    Hi Daren,

    At the moment the Operator fetches the job state via
    https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
    which contains the 'end-time' and 'duration' fields already. I feel calling
    the
    https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
    after the previous call for every job in every reconcile loop would be too
    expensive.

    Best,
    Matyas

    On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN <da...@amazon.co.uk.invalid>
    wrote:

    > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had
    > a quick chat with Gyula as I propose to include a few additional fields in
    > the jobStatus CRD for Flink Kubernetes Operator such as:
    >
    > - endTime
    > - duration
    > - jobPlan
    >
    > Further details of each states can be found here<
    > https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java>.
    > Although addition of these 3 states stem from an internal requirement, I
    > think they would be beneficial to others who uses these states in their
    > application as well. The list of states above are not exhaustive, so do let
    > me know if there are other states that you would like to include together
    > in this iteration cycle.
    >
    > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
    >

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Yang Wang <da...@gmail.com>.

I share mytyas's concern if we list the jobs first and then followed by
some get-job-detail requests.
It is a bit expensive and I do not see the benefit to store jobPlan in the
CR status.

Best,
Yang


Őrhidi Mátyás <ma...@gmail.com> 于2022年7月11日周一 21:43写道：

> Hi Daren,
>
> At the moment the Operator fetches the job state via
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
> which contains the 'end-time' and 'duration' fields already. I feel calling
> the
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
> after the previous call for every job in every reconcile loop would be too
> expensive.
>
> Best,
> Matyas
>
> On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN <darenwkt@amazon.co.uk.invalid
> >
> wrote:
>
> > Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had
> > a quick chat with Gyula as I propose to include a few additional fields
> in
> > the jobStatus CRD for Flink Kubernetes Operator such as:
> >
> > - endTime
> > - duration
> > - jobPlan
> >
> > Further details of each states can be found here<
> >
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java
> >.
> > Although addition of these 3 states stem from an internal requirement, I
> > think they would be beneficial to others who uses these states in their
> > application as well. The list of states above are not exhaustive, so do
> let
> > me know if there are other states that you would like to include together
> > in this iteration cycle.
> >
> > JIRA: https://issues.apache.org/jira/browse/FLINK-28494
> >
>

Re: [DISCUSS] Add new JobStatus fields to Flink Kubernetes Operator CRD

Posted by Őrhidi Mátyás <ma...@gmail.com>.

Hi Daren,

At the moment the Operator fetches the job state via
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-overview
which contains the 'end-time' and 'duration' fields already. I feel calling
the
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid
after the previous call for every job in every reconcile loop would be too
expensive.

Best,
Matyas

On Mon, Jul 11, 2022 at 3:17 PM WONG, DAREN <da...@amazon.co.uk.invalid>
wrote:

> Hi everyone, I am Daren from AWS Kinesis Data Analytics (KDA) team. I had
> a quick chat with Gyula as I propose to include a few additional fields in
> the jobStatus CRD for Flink Kubernetes Operator such as:
>
> - endTime
> - duration
> - jobPlan
>
> Further details of each states can be found here<
> https://github.com/darenwkt/flink/blob/release-1.15.0/flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/job/JobDetailsInfo.java>.
> Although addition of these 3 states stem from an internal requirement, I
> think they would be beneficial to others who uses these states in their
> application as well. The list of states above are not exhaustive, so do let
> me know if there are other states that you would like to include together
> in this iteration cycle.
>
> JIRA: https://issues.apache.org/jira/browse/FLINK-28494
>