You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by abhishekshivanna <gi...@git.apache.org> on 2017/12/01 18:41:21 UTC

[GitHub] samza pull request #375: SAMZA-1506: Fix for robust ContainerHeartbeatMonito...

GitHub user abhishekshivanna opened a pull request:

    https://github.com/apache/samza/pull/375

    SAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.

    The Fix includes the following changes:
    - Catch all exceptions inside the heartbeat thread and not just
      IOException.
    - A time based force kill when the heartbeat is invalid,
      this makes the monitor immune to threads that may keep the
      container stuck in the shutdown sequence. When the timeout
      occurs, a System.exit(1) is called.
    - Increasing number of retries for failed heartbeats from 3 to 6.
      This prevents short intermittent network failurs from causing the
      containers to be invalidated.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/abhishekshivanna/samza container-heartbeat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/375.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #375
    
----
commit 55145366b0a2e15b30665e88cead5f6bfd75ee2e
Author: Abhishek Shivanna <ab...@gmail.com>
Date:   2017-11-30T20:09:10Z

    SAMZA-1506: Fix for robust ContainerHeartbeatMonitor exception handling.
    
    The Fix includes the following changes:
    - Catch all exceptions inside the heartbeat thread and not just
      IOException.
    - A time based force kill when the heartbeat is invalid,
      this makes the monitor immune to threads that may keep the
      container stuck in the shutdown sequence. When the timeout
      occurs, a System.exit(1) is called.
    - Increasing number of retries for failed heartbeats from 3 to 6.
      This prevents short intermittent network failurs from causing the
      containers to be invalidated.

----


---

[GitHub] samza pull request #375: SAMZA-1506: Fix for robust ContainerHeartbeatMonito...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/samza/pull/375


---