You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "zhou_shuaifeng@sina.com" <zh...@sina.com> on 2015/09/10 15:09:51 UTC

NotServingRegion: hbase region closed forever when open region response time out

Hi all, 
    I found a situation may cause region closed forever, and this situation happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the problem:
    1, master send region open to regionserver
    2, rs open a handler do openregion
    3, rs return resopnse to master
    3, master not received the response, or timeout, send open region again
    4, rs already opened the region
    5, master processAlreadyOpenedRegion, update regionstate open in master memory
    6, master received zk message region opened(for some reason late, eg: net work), and triger update regionstate open, but find that region already opened, ERROR!
    7, master send close region, and region be closed forever.

may be a solution is change processAlreadyOpenedRegion in class AssignmentManager:

  private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) {
    // Remove region from in-memory transition and unassigned node from ZK
    // While trying to enable the table the regions of the table were
    // already enabled.
    LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString()
      + " to " + sn);
    String encodedName = region.getEncodedName();

     /**
     *  check region state in zk, if already opened, return; leave the regionStates work to zkStatus change to trigger.
    **/


    deleteNodeInStates(encodedName, "offline", sn, EventType.M_ZK_REGION_OFFLINE);
    regionStates.regionOnline(region, sn);
  }


zhou_shuaifeng@sina.com

Re: Re: NotServingRegion: hbase region closed forever when open region response time out

Posted by "zhou_shuaifeng@sina.com" <zh...@sina.com>.
Thanks, Ted.
I will open a JIRA more logs about the problem.




zhou_shuaifeng@sina.com
 
From: Ted Yu
Date: 2015-09-10 22:23
To: zhou_shuaifeng@sina.com
CC: dev@hbase.apache.org; 张铎; wangyongqiang0617
Subject: Re: NotServingRegion: hbase region closed forever when open region response time out
Can you come up with a test that shows the problem ?

Consider opening a JIRA with anonymized master log, your test and proposed solution (if you have one).

Cheers

On Thu, Sep 10, 2015 at 6:09 AM, zhou_shuaifeng@sina.com <zh...@sina.com> wrote:
Hi all, 
    I found a situation may cause region closed forever, and this situation happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the problem:
    1, master send region open to regionserver
    2, rs open a handler do openregion
    3, rs return resopnse to master
    3, master not received the response, or timeout, send open region again
    4, rs already opened the region
    5, master processAlreadyOpenedRegion, update regionstate open in master memory
    6, master received zk message region opened(for some reason late, eg: net work), and triger update regionstate open, but find that region already opened, ERROR!
    7, master send close region, and region be closed forever.

may be a solution is change processAlreadyOpenedRegion in class AssignmentManager:

  private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) {
    // Remove region from in-memory transition and unassigned node from ZK
    // While trying to enable the table the regions of the table were
    // already enabled.
    LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString()
      + " to " + sn);
    String encodedName = region.getEncodedName();

     /**
     *  check region state in zk, if already opened, return; leave the regionStates work to zkStatus change to trigger.
    **/


    deleteNodeInStates(encodedName, "offline", sn, EventType.M_ZK_REGION_OFFLINE);
    regionStates.regionOnline(region, sn);
  }


zhou_shuaifeng@sina.com


Re: Re: NotServingRegion: hbase region closed forever when open region response time out

Posted by "zhou_shuaifeng@sina.com" <zh...@sina.com>.
I create a JIRA, and submit logs with analysis.
And attached possible solution patches, please review it, thanks.
https://issues.apache.org/jira/browse/HBASE-14407 

Cheers



zhou_shuaifeng@sina.com
 
From: Ted Yu
Date: 2015-09-10 22:23
To: zhou_shuaifeng@sina.com
CC: dev@hbase.apache.org; 张铎; wangyongqiang0617
Subject: Re: NotServingRegion: hbase region closed forever when open region response time out
Can you come up with a test that shows the problem ?

Consider opening a JIRA with anonymized master log, your test and proposed solution (if you have one).

Cheers

On Thu, Sep 10, 2015 at 6:09 AM, zhou_shuaifeng@sina.com <zh...@sina.com> wrote:
Hi all, 
    I found a situation may cause region closed forever, and this situation happend usually on my cluster, version is 0.98.10, but 1.1.2 also have the problem:
    1, master send region open to regionserver
    2, rs open a handler do openregion
    3, rs return resopnse to master
    3, master not received the response, or timeout, send open region again
    4, rs already opened the region
    5, master processAlreadyOpenedRegion, update regionstate open in master memory
    6, master received zk message region opened(for some reason late, eg: net work), and triger update regionstate open, but find that region already opened, ERROR!
    7, master send close region, and region be closed forever.

may be a solution is change processAlreadyOpenedRegion in class AssignmentManager:

  private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) {
    // Remove region from in-memory transition and unassigned node from ZK
    // While trying to enable the table the regions of the table were
    // already enabled.
    LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString()
      + " to " + sn);
    String encodedName = region.getEncodedName();

     /**
     *  check region state in zk, if already opened, return; leave the regionStates work to zkStatus change to trigger.
    **/


    deleteNodeInStates(encodedName, "offline", sn, EventType.M_ZK_REGION_OFFLINE);
    regionStates.regionOnline(region, sn);
  }


zhou_shuaifeng@sina.com


Re: NotServingRegion: hbase region closed forever when open region response time out

Posted by Ted Yu <yu...@gmail.com>.
Can you come up with a test that shows the problem ?

Consider opening a JIRA with anonymized master log, your test and proposed
solution (if you have one).

Cheers

On Thu, Sep 10, 2015 at 6:09 AM, zhou_shuaifeng@sina.com <
zhou_shuaifeng@sina.com> wrote:

> Hi all,
>     I found a situation may cause region closed forever, and this
> situation happend usually on my cluster, version is 0.98.10, but 1.1.2 also
> have the problem:
>     1, master send region open to regionserver
>     2, rs open a handler do openregion
>     3, rs return resopnse to master
>     3, master not received the response, or timeout, send open region again
>     4, rs already opened the region
>     5, master processAlreadyOpenedRegion, update regionstate open in
> master memory
>     6, master received zk message region opened(for some reason late, eg:
> net work), and triger update regionstate open, but find that region already
> opened, ERROR!
>     7, master send close region, and region be closed forever.
>
> may be a solution is change processAlreadyOpenedRegion in class
> AssignmentManager:
>
>
> private void processAlreadyOpenedRegion(HRegionInfo region, ServerName sn) {
>     // Remove region from in-memory transition and unassigned node from ZK
>     // While trying to enable the table the regions of the table were
>     // already enabled.
>     LOG.debug("ALREADY_OPENED " + region.getRegionNameAsString()
>       + " to " + sn);
>     String encodedName = region.getEncodedName();
>
>      /**
>      *  check region state in zk, if already opened, return; leave the
> regionStates work to zkStatus change to trigger.
>     **/
>
>
>
>     deleteNodeInStates(encodedName, "offline", sn, EventType.M_ZK_REGION_OFFLINE);
>     regionStates.regionOnline(region, sn);
>   }
> ------------------------------
> zhou_shuaifeng@sina.com
>