You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2019/07/11 23:14:49 UTC

[GitHub] [samza] dnishimura opened a new pull request #1104: SAMZA-2266: Introduce a backoff when there are repeated failures for host-affinity allocations

dnishimura opened a new pull request #1104: SAMZA-2266: Introduce a backoff when there are repeated failures for host-affinity allocations
URL: https://github.com/apache/samza/pull/1104
 
 
   **Motivation**
   For host-affinity enabled jobs, a bad physical host may not immediately be marked as invalid by the Resource Manager (RM). As a result, when the `HostAwareContainerAllocator` requests preferred hosts, the RM generates the `onResourceCompleted` callback even though the host can't be allocated. The status error in the `onResourceCompleted` is equivalent to an application error and the retry logic kicks in to restart the failed container. Adding delays in the retry logic will prevent the job from failing prematurely (after 8 retries) before the bad host is marked invalid.
   
   **Implementation notes**
   Added an exponential back-off with a max delay. Container allocation requests are put in a priority queue with the priority determined by type and request timestamp. For retries that have a delay, I set the request timestamp in the future by time X where X is the calculated back-off.
   
   **Testing**
   Unit tests and tested a Samza job on a YARN cluster. I simulated the scenario by forcing an uncaught exception in a few containers to force the containers to fail.
   
   @rmatharu @abhishekshivanna and others please take a look

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services