You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jean-Marc Spaggiari (JIRA)" <ji...@apache.org> on 2013/12/19 03:42:07 UTC

[jira] [Commented] (HBASE-10199) Improve rolling restart latency

    [ https://issues.apache.org/jira/browse/HBASE-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852518#comment-13852518 ] 

Jean-Marc Spaggiari commented on HBASE-10199:
---------------------------------------------

Valuable comments from [~liushaohui] from HBASE-10049:
{quote}

    What is the reason behind the 20 seconds delay?

There is a time gap between RS's startup report to HMaster and it's starting of service threads. And we found some exceptions in moving regions for RS have not finished to start it's service threads. So we add 20 seconds delay to make sure the RS have enough time to finish initialization.
But the 20 may be not reasonable, especially for large clusters.

Ps: For large clusters, we plan to dev a region_mover which can unload/load multi regionservers at the same time

    Adding a log in the move method make the application VERY verbose, doubling the output. Is that really useful? 

Yes. I think it's very useful to measure the maximum unavailable time for each region using region_mover.rb.
And many other factors and configs will affect this time, eg: hbase.hstore.open.and.close.threads.max.
According to this time. we can do more optimizations to reduce the unavailable time in gracefull upgrade.

I don't know if the explanation is clear. More discussions are welcomed. Thanks.
{quote}

I'm waiting for HBASE-8803 to be commited and will address all of there in this JIRA. Will make sure that we keep the idea of all  [~liushaohui]  modifications.

> Improve rolling restart latency
> -------------------------------
>
>                 Key: HBASE-10199
>                 URL: https://issues.apache.org/jira/browse/HBASE-10199
>             Project: HBase
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.94.14, 0.98.1, 0.99.0, 0.96.1.1
>            Reporter: Jean-Marc Spaggiari
>            Assignee: Jean-Marc Spaggiari
>
> HBASE-10049 introduced a 20 seconds delay in the region_mover script to make sure server is online before starting the transfer. On big cluster, this can add more than 1 hour to the total time. We need to find a way to make 100% sure the server is online without having to wait for those 20 seconds.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)