You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by "Sandesh (JIRA)" <ji...@apache.org> on 2017/08/15 22:11:00 UTC
[jira] [Resolved] (APEXCORE-743) Killed container is shown as
running
[ https://issues.apache.org/jira/browse/APEXCORE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandesh resolved APEXCORE-743.
------------------------------
Resolution: Fixed
Fix Version/s: 3.7.0
> Killed container is shown as running
> ------------------------------------
>
> Key: APEXCORE-743
> URL: https://issues.apache.org/jira/browse/APEXCORE-743
> Project: Apache Apex Core
> Issue Type: Bug
> Reporter: Sandesh
> Assignee: Sandesh
> Fix For: 3.7.0
>
>
> Here is the behavior
> 1. Container Heartbeat timeout happened
> 2. AppMaster sends the request to kill the container
> 3. Container is killed
> 4. AppMaster state is not updated and no new container was allocated
> After analyzing the code here is the possible reason
> 1. Send the kill request to NM
> 2. Container killed by NM, but NM callback doesn't happen. RecoverContainer is called in NM callback, which in this case is not called.
> 3. AppMaster state is not updated
> Possible fix.
> Have a timeout for NM callback, so that if NM doesn't respond that the container is killed in time, call the RecoverContainer.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)