You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/01/09 00:46:59 UTC

[jira] Created: (HBASE-1111) [performance] Crash recovery takes way too long

[performance] Crash recovery takes way too long
-----------------------------------------------

                 Key: HBASE-1111
                 URL: https://issues.apache.org/jira/browse/HBASE-1111
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: stack


Watching hbase recover from crashes, its taking way too long:

1. Must wait first on lease to expire (if server is rebooted, it should cancel the old servers' lease but make sure the lease expiration code runs)
2. Master splits logs.  This is single-threaded.  At least a maximum of 64 logs but seems to run slow anyways.
3. Assign out the regions that were on dead-server (minutes or even tens of minutes could have elapsed at this stage)
4. Wait on the regionservers to open.  If small cluster, because regionservers open regions in series, could take a long time opening a bunch of issues.  Meantime the regions are not available, clients will likely timeout.
5. To make things worse, I've seen load-balancer cut in to 'help out' telling regionserver close some of its regions though its busy opening a bunch.

Andrew Purtell notes that HBASE-1110 will change a bunch of the above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1111) [performance] Crash recovery takes way too long

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662191#action_12662191 ] 

stack commented on HBASE-1111:
------------------------------

Yeah, just tried killing server that was carrying -ROOT-.  Took 20 minutes just processing 41 log files.  All clients were  dead by time cluster was again useable.

> [performance] Crash recovery takes way too long
> -----------------------------------------------
>
>                 Key: HBASE-1111
>                 URL: https://issues.apache.org/jira/browse/HBASE-1111
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>
> Watching hbase recover from crashes, its taking way too long:
> 1. Must wait first on lease to expire (if server is rebooted, it should cancel the old servers' lease but make sure the lease expiration code runs)
> 2. Master splits logs.  This is single-threaded.  At least a maximum of 64 logs but seems to run slow anyways.
> 3. Assign out the regions that were on dead-server (minutes or even tens of minutes could have elapsed at this stage)
> 4. Wait on the regionservers to open.  If small cluster, because regionservers open regions in series, could take a long time opening a bunch of issues.  Meantime the regions are not available, clients will likely timeout.
> 5. To make things worse, I've seen load-balancer cut in to 'help out' telling regionserver close some of its regions though its busy opening a bunch.
> Andrew Purtell notes that HBASE-1110 will change a bunch of the above.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.