You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2010/03/18 00:44:27 UTC

[jira] Commented: (HBASE-2342) Consider adding a watchdog node next to region server

    [ https://issues.apache.org/jira/browse/HBASE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846676#action_12846676 ] 

ryan rawson commented on HBASE-2342:
------------------------------------

Run it as a parent process, thus ensuring via the OS level primitives that you get notification of a process death. You can also aggressively set the ZK timeout low (since it will be running in a low-GC pause process) and you can tune the detection of a Juliet pause differently.  Maybe you would accept a 30 second GC pause, it might be better than killing the processing, ensuring data loss (pre HDFS-200) or otherwise causing unnecessary cluster churn.

> Consider adding a watchdog node next to region server
> -----------------------------------------------------
>
>                 Key: HBASE-2342
>                 URL: https://issues.apache.org/jira/browse/HBASE-2342
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> This idea has been bandied about a fair amount. The concept is to add a second java process that runs next to each region server to act as a watchdog. Several possible purposes:
> - monitor the RS for liveness - if it exhibits Juliet syndrome ("appears dead") then we kill it agressively to prevent it from coming back to life
> - restart RS automatically in failure cases
> - potentially move the entire ZK session to the watchdog to decouple node liveness from the particular JVM liveness
> Let's discuss in this JIRA.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.