You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jake Maes (JIRA)" <ji...@apache.org> on 2017/03/02 15:48:45 UTC

[jira] [Commented] (SAMZA-1116) Yarn RM recovery causing duplicate containers

    [ https://issues.apache.org/jira/browse/SAMZA-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892439#comment-15892439 ] 

Jake Maes commented on SAMZA-1116:
----------------------------------

Thanks for reporting this [~danil]. 

I'm going to associate this ticket with SAMZA-871, as I think that's probably the best fix. 

I'm assuming there was only one RM and that YARN was not configured with RM HA. Is that right?  (https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html) 

With HA, this scenario should be less likely because all the RMs would need to be terminated before the AM would fail. If you don't have HA configured already, that may be one workaround until SAMZA-871 is implemented. 

Thanks!

> Yarn RM recovery causing duplicate containers
> ---------------------------------------------
>
>                 Key: SAMZA-1116
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1116
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.11
>            Reporter: Danil Serdyuchenko
>
> To replicate:
> # Make sure that Yarn RM recovery is enabled
> # Deploy a test job
> # Terminate Yarn RM
> # Wait until AM of the job terminate with: 
> {code}
> 2017-02-02 13:08:04 RetryInvocationHandler [WARN] Exception while invoking class org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.finishApplicationMaster over rm2. Not retrying because failovers (30) exceeded maximum allowed (30)
> {code}
> # Restart RM
> The job should get a new attempt but the old containers will not be terminated, causing duplicate containers to run. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)