You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Hai Lu (Jira)" <ji...@apache.org> on 2019/11/07 00:06:03 UTC

[jira] [Updated] (SAMZA-2248) Fix AM bookkeeping on receiving dead container notifications

     [ https://issues.apache.org/jira/browse/SAMZA-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hai Lu updated SAMZA-2248:
--------------------------
    Fix Version/s: 1.3

> Fix AM bookkeeping on receiving dead container notifications
> ------------------------------------------------------------
>
>                 Key: SAMZA-2248
>                 URL: https://issues.apache.org/jira/browse/SAMZA-2248
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Xinyu Liu
>            Assignee: Xinyu Liu
>            Priority: Major
>             Fix For: 1.3
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Issue tldr:
> 1. AM gets extra containers from the RM which it saves for later use
> 2. When the container that we saved in step1 dies, the AM on receiving this callback does nothing about it.
> 3. Later, when we are looking for a container to use - we pick up the dead container that we saved and did not clean up (step 1&2) and launch a container. 
> 4. Now, if this launched container ever dies - the RM will never notify the AM about it since it see's it as a duplicate (step 2)
> 5. Job is left without the container rescheduled and will need to be restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)