You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/11/10 15:29:27 UTC

[jira] Commented: (HBASE-1964) Add internal status monitoring to RegionServer

    [ https://issues.apache.org/jira/browse/HBASE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775432#action_12775432 ] 

Andrew Purtell commented on HBASE-1964:
---------------------------------------

bq. When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted

We take exception to this statement. One can corrupt an Oracle database by overcommitting RAM such that the kernel panics in get_free_page (on Linux). 

bq. A graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been reduced. 

There is no substitute for competent monitoring and administration of production systems, especially ones which try to support terascale or petascale storage and computation over 10s or 100s of servers. However, certainly it is the case that HBase has opportunities to sense overloading and take self preserving actions where currently it does not.

> Add internal status monitoring to RegionServer
> ----------------------------------------------
>
>                 Key: HBASE-1964
>                 URL: https://issues.apache.org/jira/browse/HBASE-1964
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.1
>            Reporter: elsif 
>
> When a hadoop/hbase cluster is under heavy load it will inevitably reach a tipping point where data is lost or corrupted.  A
> graceful method is needed to put the cluster into safe mode until more resources can be added or the load on the cluster has been
> reduced.  
> St.Ack has suggested the following short-term task: "Meantime, it should be possible to have a cron run a script that checks
> cluster resources from time-to-time -- e.g. how full hdfs is, how much each regionserver is carrying -- and when it determines the needle is in the red,
> flip the cluster to be read-only."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.