You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zhong Deyin (JIRA)" <ji...@apache.org> on 2012/12/28 04:16:13 UTC

[jira] [Resolved] (HBASE-7445) Hbase cluster is unavailable while the regionserver that Meta table deployed crashed

     [ https://issues.apache.org/jira/browse/HBASE-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhong Deyin resolved HBASE-7445.
--------------------------------

    Resolution: Fixed

modify class org.apache.hadoop.hbase.master.handler.ServerShutdownHandler, change the process method, replace code this.services.getAssignmentManager().assignMeta() to assignMetaWithRetries(),then meta will try 10 times while the regionservers crashed.
{code}
  // Carrying meta
      if (isCarryingMeta()) {
        LOG.info("Server " + serverName +
          " was carrying META. Trying to assign.");
        this.services.getAssignmentManager().
          regionOffline(HRegionInfo.FIRST_META_REGIONINFO);
        //this.services.getAssignmentManager().assignMeta();
        assignMetaWithRetries();
      }
{code}
Add method assignMetaWithRetries, code of assignMetaWithRetries method as follows:
{code}
     private void assignMetaWithRetries() throws IOException{
	  int iTimes = this.server.getConfiguration().getInt(
		        "hbase.catalog.verification.retries", 10);

		    long waitTime = this.server.getConfiguration().getLong(
		        "hbase.catalog.verification.timeout", 1000);

		    int iFlag = 0;
		    LOG.info("TEST START");
		    while (true) {
		      try {
		       // verifyAndAssignRoot();
		        this.services.getAssignmentManager().assignMeta();
		        break;
		      } catch (Exception e) {
		        if (iFlag >= iTimes) {
		          this.server.abort("  test ming  assginMeta failed after" + iTimes
		              + " times retries, aborting", e);
		          throw new IOException("Aborting", e);
		        }
		        try {
		          Thread.sleep(waitTime);
		        } catch (InterruptedException e1) {
		          LOG.warn("Interrupted when is the thread sleep", e1);
		          Thread.currentThread().interrupt();
		          throw new IOException("Interrupted", e1);
		        }
		        iFlag++;
		      }
		    } 
		    LOG.info("TEST END HBASE");
  }
{code}
                
> Hbase cluster is unavailable while the regionserver that Meta table deployed crashed
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-7445
>                 URL: https://issues.apache.org/jira/browse/HBASE-7445
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment, regionserver
>    Affects Versions: 0.94.1
>         Environment: Hadoop 0.20.2-cdh3u3
> Hbase 0.94.1
>            Reporter: Zhong Deyin
>              Labels: patch
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> while the regionserver that META table deployed crashed, the .META. table can't migrate to other available regionservers. Then the region spliting, can't find META table, cause the whole cluster is unavailable.
> Code path: org.apache.hadoop.hbase.master.handler.ServerShutdownHandler
> {code}
>       // Carrying meta
>       if (isCarryingMeta()) {
>         LOG.info("Server " + serverName + " was carrying META. Trying to assign.");
>         this.services.getAssignmentManager().
>           regionOffline(HRegionInfo.FIRST_META_REGIONINFO);
>         this.services.getAssignmentManager().assignMeta();
>       }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira