You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by abhishekshivanna <gi...@git.apache.org> on 2017/12/01 18:41:21 UTC
[GitHub] samza pull request #375: SAMZA-1506: Fix for robust ContainerHeartbeatMonito...
GitHub user abhishekshivanna opened a pull request:
https://github.com/apache/samza/pull/375
SAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.
The Fix includes the following changes:
- Catch all exceptions inside the heartbeat thread and not just
IOException.
- A time based force kill when the heartbeat is invalid,
this makes the monitor immune to threads that may keep the
container stuck in the shutdown sequence. When the timeout
occurs, a System.exit(1) is called.
- Increasing number of retries for failed heartbeats from 3 to 6.
This prevents short intermittent network failurs from causing the
containers to be invalidated.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/abhishekshivanna/samza container-heartbeat
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/samza/pull/375.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #375
----
commit 55145366b0a2e15b30665e88cead5f6bfd75ee2e
Author: Abhishek Shivanna <ab...@gmail.com>
Date: 2017-11-30T20:09:10Z
SAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.
The Fix includes the following changes:
- Catch all exceptions inside the heartbeat thread and not just
IOException.
- A time based force kill when the heartbeat is invalid,
this makes the monitor immune to threads that may keep the
container stuck in the shutdown sequence. When the timeout
occurs, a System.exit(1) is called.
- Increasing number of retries for failed heartbeats from 3 to 6.
This prevents short intermittent network failurs from causing the
containers to be invalidated.
----
---
[GitHub] samza pull request #375: SAMZA-1506: Fix for robust ContainerHeartbeatMonito...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/samza/pull/375
---