You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Ken Krugler <kk...@transpac.com> on 2020/05/12 23:08:51 UTC
Re: Flink restart strategy on specific exception

Hi Til,

Sorry to resurface an ancient question, but is there a working example anywhere of setting a custom restart strategy?

Asking because I’ve been wandering through the Flink 1.9 code base for a while, and the restart strategy implementation is…pretty tangled.

From what I’ve been able to figure out, you have to provide a factory class, something like this:

        Configuration config = new Configuration();
        config.setString(ConfigConstants.RESTART_STRATEGY, MyRestartStrategyFactory.class.getCanonicalName());
        StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(4, config);

That factory class should extend RestartStrategyFactory, but it also needs to implement a static method that looks like:

    public static MyRestartStrategyFactory createFactory(Configuration config) {
        return new MyRestartStrategyFactory();
    }

I wasn’t able to find any documentation that mentioned this particular method being a requirement.

And also the documentation at https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance <https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#fault-tolerance> doesn’t mention you can set a custom class name for the restart-strategy.

Thanks,

— Ken


> On Nov 22, 2018, at 8:18 AM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Hi Kasif,
> 
> I think in this situation it is best if you defined your own custom RestartStrategy by specifying a class which has a `RestartStrategyFactory createFactory(Configuration configuration)` method as `restart-strategy: MyRestartStrategyFactoryFactory` in `flink-conf.yaml`.
> 
> Cheers,
> Till
> 
> On Thu, Nov 22, 2018 at 7:18 AM Ali, Kasif <Kasif.Ali@gs.com <ma...@gs.com>> wrote:
> Hello,
> 
>  
> 
> Looking at existing restart strategies they are kind of generic. We have a requirement to restart the job only in case of specific exception/issues.
> 
> What would be the best way to have a re start strategy which is based on few rules like looking at particular type of exception or some extra condition checks which are application specific.?
> 
>  
> 
> Just a background on one specific issue which invoked this requirement is slots not getting released when the job finishes. In our applications, we keep track of jobs submitted with the amount of parallelism allotted to it.  Once the job finishes we assume that the slots are free and try to submit next set of jobs which at times fail with error  “not enough slots available”.
> 
>  
> 
> So we think a job re start can solve this issue but we only want to re start only if this particular situation is encountered.
> 
>  
> 
> Please let us know If there are better ways to solve this problem other than re start strategy.
> 
>  
> 
> Thanks,
> 
> Kasif
> 
>  
> 
> 
> 
> Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr