You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Segel (JIRA)" <ji...@apache.org> on 2010/11/01 15:59:27 UTC
[jira] Commented: (HBASE-3168) Sanity date and time check when a region server joins the cluster

    [ https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927000#action_12927000 ] 

Michael Segel  commented on HBASE-3168:
---------------------------------------

Sorry, but just to play Devil's Advocate... is this code is really necessary? 

When you start a Unix Server, you should have it configured to check with an NTP server and the cluster should all point to the same NTP server.
Over time, depending on the quality of the hardware, there could be some drift, however this should be less than 1 second per year.

If you set up NTP syncs to occur once a month, your cluster should all have relatively the same time. 

The reason I question the necessity of the code is that the code is a bit pessimistic.

In the real world... if you think about it... you can purchase a NTP server for under 10K (Specialized Clock, GPS , etc ...). So if you have a cloud of 500+ nodes, it would make sense to have your own NTP hardware.  Sync'ing to a local NTP could then happen once a day, week, month depending on your paranoia.

If you're getting timeouts on a RS, hardware issues should be easily identified and then fixed ASAP.

I do agree, timestamps are a slightly different and larger issue than this jira.

> Sanity date and time check when a region server joins the cluster
> -----------------------------------------------------------------
>
>                 Key: HBASE-3168
>                 URL: https://issues.apache.org/jira/browse/HBASE-3168
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.89.20100924
>         Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>            Reporter: Jeff Whiting
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock isn't too far out of skew with the rest of the cluster.  If the RS's time is too far out of skew then the master would prevent it from joining and RS would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
>     HServerInfo info = new HServerInfo(serverInfo);
>     checkIsDead(info.getServerName(), "STARTUP");
>     checkAlreadySameHostPort(info);
>     recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.