You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by unknown unknown <un...@gmail.com> on 2022/05/26 15:05:43 UTC
Custom restart strategy
Hello Users!
I would like to notify an external endpoint when a streaming job has a
certain number of restarts. While I can use a service to continuously *poll*
Flink metrics and identify failing jobs, I am looking to inverse the
action and have the job notify. We have around ~50 streaming jobs and it
gets challenging querying on a continuous basis.
Looking into [1], the intrusive way was to perform the action at [2]
(not tested though) Happy to hear suggestions and alternatives ?
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies
[2]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68
Thanks
AK.
Re: Custom restart strategy
Posted by Shengkai Fang <fs...@gmail.com>.
Hi.
Maybe the metric reporter[1] is suitabe for your case.
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/metric_reporters/
unknown unknown <un...@gmail.com> 于2022年5月28日周六 12:49写道:
> Thanks Shengkai! Unfortunately, this would require querying status for
> each job continuously. Given very few pipelines experience failures and
> they are far in-between, I am looking for a push based model vs polling.
>
> Thanks
> AK
>
> On Thu, May 26, 2022 at 7:21 PM Shengkai Fang <fs...@gmail.com> wrote:
>
>> Hi.
>>
>> I think you can use REST OPEN API to fetch the job status from the
>> JM periodically to detect whether something happens. Currently REST OPEN
>> API also supports to fetch the exception list for the specified job[2].
>>
>> Best,
>> Shengkai
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs
>> [2]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-exceptions
>>
>> unknown unknown <un...@gmail.com> 于2022年5月26日周四 23:06写道:
>>
>>> Hello Users!
>>>
>>> I would like to notify an external endpoint when a streaming job has
>>> a certain number of restarts. While I can use a service to continuously
>>> *poll* Flink metrics and identify failing jobs, I am looking to
>>> inverse the action and have the job notify. We have around ~50 streaming
>>> jobs and it gets challenging querying on a continuous basis.
>>>
>>> Looking into [1], the intrusive way was to perform the action at [2]
>>> (not tested though) Happy to hear suggestions and alternatives ?
>>>
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies
>>>
>>>
>>> [2]
>>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68
>>>
>>>
>>> Thanks
>>> AK.
>>>
>>
Re: Custom restart strategy
Posted by unknown unknown <un...@gmail.com>.
Thanks Shengkai! Unfortunately, this would require querying status for each
job continuously. Given very few pipelines experience failures and they are
far in-between, I am looking for a push based model vs polling.
Thanks
AK
On Thu, May 26, 2022 at 7:21 PM Shengkai Fang <fs...@gmail.com> wrote:
> Hi.
>
> I think you can use REST OPEN API to fetch the job status from the
> JM periodically to detect whether something happens. Currently REST OPEN
> API also supports to fetch the exception list for the specified job[2].
>
> Best,
> Shengkai
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-exceptions
>
> unknown unknown <un...@gmail.com> 于2022年5月26日周四 23:06写道:
>
>> Hello Users!
>>
>> I would like to notify an external endpoint when a streaming job has
>> a certain number of restarts. While I can use a service to continuously
>> *poll* Flink metrics and identify failing jobs, I am looking to
>> inverse the action and have the job notify. We have around ~50 streaming
>> jobs and it gets challenging querying on a continuous basis.
>>
>> Looking into [1], the intrusive way was to perform the action at [2]
>> (not tested though) Happy to hear suggestions and alternatives ?
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies
>>
>>
>> [2]
>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68
>>
>>
>> Thanks
>> AK.
>>
>
Re: Custom restart strategy
Posted by Shengkai Fang <fs...@gmail.com>.
Hi.
I think you can use REST OPEN API to fetch the job status from the
JM periodically to detect whether something happens. Currently REST OPEN
API also supports to fetch the exception list for the specified job[2].
Best,
Shengkai
[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-exceptions
unknown unknown <un...@gmail.com> 于2022年5月26日周四 23:06写道:
> Hello Users!
>
> I would like to notify an external endpoint when a streaming job has a
> certain number of restarts. While I can use a service to continuously
> *poll* Flink metrics and identify failing jobs, I am looking to
> inverse the action and have the job notify. We have around ~50 streaming
> jobs and it gets challenging querying on a continuous basis.
>
> Looking into [1], the intrusive way was to perform the action at [2]
> (not tested though) Happy to hear suggestions and alternatives ?
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/execution/task_failure_recovery/#restart-strategies
>
>
> [2]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/FixedDelayRestartBackoffTimeStrategy.java#L68
>
>
> Thanks
> AK.
>