You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Dominik Wosiński <wo...@gmail.com> on 2019/10/14 12:08:35 UTC

Ignore operator failure

Hey,
I have a question that I have not been able to find an answer for in the
docs nor in any other source. Suppose we have a business system and we are
using Elasticsearch sink, but not for the purpose of business case, but
rather for keeping info on the data that is flowing through the system. The
Elasticsearch part is not crucial for the application, thus I would like to
keep application running even if the elastic itself is failing (for example
due to the external system being down). Is there a way to exclude some task
from checkpointing and ignore it's failure, so that the job is not
restarted if only one of the sinks is down ??

Thanks in advance,
Best Regards,
Dom.

Re: Ignore operator failure

Posted by vino yang <ya...@gmail.com>.

Hi Dom,

If you consider ignoring checkpoint failures, you can use this API:
setTolerableCheckpointFailureNumber[1].
But for Jobs with checkpoints enabled and failed operators containing
states, Flink can't ignore these failures without restarting Jobs.
Subsequent regional recovery may be appropriate for your scenario.
At this stage, if you don't want to restart because of non-critical
operators, you may need to customize related implementations so that those
exceptions are not thrown to the Flink framework.

Best,
Vino

[1]:
https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/CheckpointConfig.java#L319



Dominik Wosiński <wo...@gmail.com> 于2019年10月14日周一 下午8:16写道：

> Hey,
> I have a question that I have not been able to find an answer for in the
> docs nor in any other source. Suppose we have a business system and we are
> using Elasticsearch sink, but not for the purpose of business case, but
> rather for keeping info on the data that is flowing through the system. The
> Elasticsearch part is not crucial for the application, thus I would like to
> keep application running even if the elastic itself is failing (for example
> due to the external system being down). Is there a way to exclude some task
> from checkpointing and ignore it's failure, so that the job is not
> restarted if only one of the sinks is down ??
>
> Thanks in advance,
> Best Regards,
> Dom.
>