You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Jake Maes (JIRA)" <ji...@apache.org> on 2016/03/14 04:39:33 UTC

[jira] [Updated] (SAMZA-893) Fix a bug with host affinity request expiration introduced in SAMZA-867

     [ https://issues.apache.org/jira/browse/SAMZA-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Maes updated SAMZA-893:
----------------------------
    Description: 
The expiration logic change in the patch for SAMZA-867 was wrong. It simplified the conditional, but would cause expired requests to never get rescheduled on ANY_HOST. 

This ticket is to fix that logic and the unit tests, which didn't fail after the change. 

  was:
A number of jobs failed or restarted when we lost a couple hosts in the cluster.
The theory is that this happened because the AppMaster detects the failed
container before YARN detects the missing NM, so it tries to run the
container on that host again, but doesn't handle the connection errors from the NM properly. Switching from a synchronous NM client model to an async model is expected to help, but we need to discuss this.


> Fix a bug with host affinity request expiration introduced in SAMZA-867
> -----------------------------------------------------------------------
>
>                 Key: SAMZA-893
>                 URL: https://issues.apache.org/jira/browse/SAMZA-893
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jake Maes
>            Assignee: Jake Maes
>
> The expiration logic change in the patch for SAMZA-867 was wrong. It simplified the conditional, but would cause expired requests to never get rescheduled on ANY_HOST. 
> This ticket is to fix that logic and the unit tests, which didn't fail after the change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)