You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Han Xiao (JIRA)" <ji...@apache.org> on 2013/12/25 08:55:51 UTC

[jira] [Commented] (HBASE-10237) Master restart, then followed by the MetaRegionServer crashed will result that the .meta. table won't online forever

    [ https://issues.apache.org/jira/browse/HBASE-10237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856540#comment-13856540 ] 

Han Xiao commented on HBASE-10237:
----------------------------------

Problem happens in Release Version 0.94.10. I check the code in branch of 0.94. Problem still exists.
Code in  method "void regionOnline(HRegionInfo regionInfo, ServerName sn)" of "AssignmentManager.java" seems be the place where the problem comes from:
{code:java}
      if (isServerOnline(sn)) {
        this.regions.put(regionInfo, sn);
        addToServers(sn, regionInfo);
        this.regions.notifyAll();
      } else {
        LOG.info("The server is not in online servers, ServerName=" + 
          sn.getServerName() + ", region=" + regionInfo.getEncodedName());
      }
{code}
The judgement comes from HBASE-4033, but it seems be useless now.
When we call regionOnline(), region should have open in the regionserver. So if the regionserver shutdowns when method is called, the info is still need to be recorded to ressign the region.
Discarding the judgement should resolve the problem.

> Master restart, then followed by the MetaRegionServer crashed will result that the .meta. table won't online forever
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10237
>                 URL: https://issues.apache.org/jira/browse/HBASE-10237
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.12
>            Reporter: Han Xiao
>
> The following logs record such  process:
> // *once allocate root and meta to node209*
> 2013-12-23 10:45:34,130 INFO  \[MASTER_OPEN_REGION-node201.vipcloud,60000,1386903739776-3\] handler.OpenedRegionHandler (OpenedRegionHandler.ja
> va:debugLog(145)) - Handling OPENED event for .META.,,1.1028785192 from node209.vipcloud,60020,1387272038024; deleting unassigned node
> 2013-12-23 14:53:36,268 INFO  \[MASTER_OPEN_REGION-node201.vipcloud,60000,1386903739776-4\] handler.OpenedRegionHandler (OpenedRegionHandler.java:debugLog(145)) - Handling OPENED event for \-ROOT\-,,0.70236052 from node209.vipcloud,60020,1387272038024; deleting unassigned node
> // *master restart*
> 2013-12-23 16:30:19 CST Starting master on node201.vipcloud
> // *209 comming*
> 2013-12-23 16:30:33,698 INFO  \[master-node201.vipcloud,60000,1387787422616\] master.ServerManager (ServerManager.java:recordNewServer(280)) -
> Registering server=node209.vipcloud,60020,1387272038024
> // *209 out*
> 2013-12-23 16:30:37,106 INFO  \[main-EventThread\] zookeeper.RegionServerTracker (RegionServerTracker.java:nodeDeleted(93)) - RegionServer ephe
> meral node deleted, processing expiration \[node209.vipcloud,60020,1387272038024\]
> // *delay processing 209 for initialization*
> 2013-12-23 16:30:37,107 INFO  \[main-EventThread\] master.ServerManager (ServerManager.java:expireServer(384)) - Master doesn't enable ServerSh
> utdownHandler during initialization, delay expiring server node209.vipcloud,60020,1387272038024
> // *assign root to node209 for data in zk node*
> 2013-12-23 16:30:42,120 INFO \[master-node201.vipcloud,60000,1387787422616\] master.HMaster (HMaster.java:assignRoot(756)) - \-ROOT\- assigned=0
> , rit=false, location=node209.vipcloud,60020,1387272038024
> // *problem happened when assign META to node209, validation passed first but when check server available failed. Therefore not update the regions struct for META*
> s1:2013-12-23 16:30:42,322 INFO  \[master-node201.vipcloud,60000,1387787422616\] master.AssignmentManager (AssignmentManager.java:regionOnline(126
> 4)) - The server is not in online servers, ServerName=node209.vipcloud,60020,1387272038024, region=1028785192
> s2:2013-12-23 16:30:42,323 INFO  \[master-node201.vipcloud,60000,1387787422616\] master.HMaster (HMaster.java:assignMeta(814)) - .META. assigned=0
> , rit=false, location=node209.vipcloud,60020,1387272038024
> // *handle node209 shutdown, only do reassign for \-ROOT\- but not for .META. because it is NOT updated in memory*
> 2013-12-23 16:31:35,978 INFO  \[MASTER_META_SERVER_OPERATIONS-node201.vipcloud,60000,1387787422616-0\] handler.MetaServerShutdownHandler (MetaS
> erverShutdownHandler.java:process(78)) - Server node209.vipcloud,60020,1387272038024 was carrying ROOT. Trying to assign.
> // *verifaction for META failed (data in \-ROOT\- still stored as node209), but no reassign. for META. META won't be online forever*
> 2013-12-23 16:31:40,048 INFO  \[MASTER_SERVER_OPERATIONS-node201.vipcloud,60000,1387787422616-0\] catalog.CatalogTracker (CatalogTracker.java:
> verifyRegionLocation(582)) - Failed verification of .META.,,1 at address=node209.vipcloud,60020,1387272038024; org.apache.hadoop.hbase.ipc.HBa
> seClient$FailedServerException: This server is in the failed servers list: node209.vipcloud/192.168.30.132:60020
> 2013-12-23 16:35:46,764 INFO  \[MASTER_SERVER_OPERATIONS-node201.vipcloud,60000,1387787422616-0\] catalog.CatalogTracker (CatalogTracker.java:v
> erifyRegionLocation(582)) - Failed verification of .META.,,1 at address=node209.vipcloud,60020,1387272038024; org.apache.hadoop.hbase.ipc.HBa
> seClient$FailedServerException: This server is in the failed servers list: node209.vipcloud/192.168.30.132:60020



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)