You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "F.Amara" <fa...@wso2.com> on 2017/02/21 09:00:39 UTC

Task Manager recovery in Standalone Cluster High Availability mode

Hi,

I'm working with Apache Flink 1.1.2 and testing on High Availability mode.
In the case of Task Manager failures they say a standby TM will recover the
work of the failed TM. In my case, I have 4 TM's running in parallel and
when a TM is killed the state goes to Cancelling and then to Failed rather
than Restarting and the work is not recovered. 

Is there a specific way to create standby TM's and a specific reason for
jobs not being recovered? 



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Task-Manager-recovery-in-Standalone-Cluster-High-Availability-mode-tp11767.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Task Manager recovery in Standalone Cluster High Availability mode

Posted by "F.Amara" <fa...@wso2.com>.
Hi,

Thanks a lot for the reply. I configured a restart strategy as suggested and
now the TM failure scenario is working as expected. Once a TM is killed
another active TM automatically recovers the job.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Task-Manager-recovery-in-Standalone-Cluster-High-Availability-mode-tp11767p11798.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Task Manager recovery in Standalone Cluster High Availability mode

Posted by Ufuk Celebi <uc...@apache.org>.
Hey! Did you configure a restart strategy?
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/restart_strategies.html

Keep in mind that in In stand alone mode a TM process that has exited
won't be automatically restarted though.

On Tue, Feb 21, 2017 at 10:00 AM, F.Amara <fa...@wso2.com> wrote:
> Hi,
>
> I'm working with Apache Flink 1.1.2 and testing on High Availability mode.
> In the case of Task Manager failures they say a standby TM will recover the
> work of the failed TM. In my case, I have 4 TM's running in parallel and
> when a TM is killed the state goes to Cancelling and then to Failed rather
> than Restarting and the work is not recovered.
>
> Is there a specific way to create standby TM's and a specific reason for
> jobs not being recovered?
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Task-Manager-recovery-in-Standalone-Cluster-High-Availability-mode-tp11767.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.