You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Bill Graham (JIRA)" <ji...@apache.org> on 2010/10/13 01:00:32 UTC

[jira] Updated: (CHUKWA-534) Improve fault-tolerance of DemuxManager, PostProcessManager and ChukwaArchiveManager.

     [ https://issues.apache.org/jira/browse/CHUKWA-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Graham updated CHUKWA-534:
-------------------------------

    Description: 
If the any of these processes receives more than N consecutive errors, it dies with the message "Too many errors, Bail out!".

Let's change to this introduce a configurable number of concurrent exceptions to be encountered before dying. If the value is set to -1, expected behavior is to keep retrying ad infinitum.

Also as part if this bug is to improve logging of how many consecutive errors have occurred, as well as the time they started. A possible future enhancement could be to support an error time threshold as well as an absolute count.

Suggesting the following new config setting. It's a bit verbose, but it's clear.

{noformat}
demux.max.error.count.before.shutdown
post.process.max.error.count.before.shutdown
archive.max.error.count.before.shutdown
{noformat}


  was:
If the DemuxManager received more than 5 consecutive errors, it dies with the message "Too many errors, Bail out!".

Let's change to this introduce a configurable number of concurrent exceptions to be encountered before dying. If the value is set to -1, expected behavior is to keep retrying ad infinitum.

Also as part if this bug is to improve logging of how many consecutive errors have occurred, as well as the time they started. A possible future enhancement could be to support an error time threshold as well as an absolute count.

Suggesting the following new config setting. It's a bit verbose, but it's clear.

{noformat}
chukwa.demux.max.error.count.before.shutdown
{noformat}


        Summary: Improve fault-tolerance of DemuxManager, PostProcessManager and ChukwaArchiveManager.  (was: Improve fault-tolerance of DemuxManager.)

Expanding the scope of this JIRA, since all three of these processes could be more fault tolerant. Most have comments regarding how they should shut down after 4 errors since watchdog will restart, but watchdog has been deprecated afaik. 

> Improve fault-tolerance of DemuxManager, PostProcessManager and ChukwaArchiveManager.
> -------------------------------------------------------------------------------------
>
>                 Key: CHUKWA-534
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-534
>             Project: Chukwa
>          Issue Type: Improvement
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>         Attachments: CHUKWA-534_1.patch, CHUKWA-534_2.patch
>
>
> If the any of these processes receives more than N consecutive errors, it dies with the message "Too many errors, Bail out!".
> Let's change to this introduce a configurable number of concurrent exceptions to be encountered before dying. If the value is set to -1, expected behavior is to keep retrying ad infinitum.
> Also as part if this bug is to improve logging of how many consecutive errors have occurred, as well as the time they started. A possible future enhancement could be to support an error time threshold as well as an absolute count.
> Suggesting the following new config setting. It's a bit verbose, but it's clear.
> {noformat}
> demux.max.error.count.before.shutdown
> post.process.max.error.count.before.shutdown
> archive.max.error.count.before.shutdown
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.