You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chunhui shen (Created) (JIRA)" <ji...@apache.org> on 2011/11/28 07:28:40 UTC

[jira] [Created] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Unhealthy region is on service caused by rollback of region splitting
---------------------------------------------------------------------

                 Key: HBASE-4881
                 URL: https://issues.apache.org/jira/browse/HBASE-4881
             Project: HBase
          Issue Type: Bug
            Reporter: chunhui shen


If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
It will be rollback as the following steps:
{code}
1.case CLOSED_PARENT_REGION:
  this.parent.initialize();
        break;
2.case CREATE_SPLIT_DIR:
    	this.parent.writestate.writesEnabled = true;
        cleanupSplitDir(fs, this.splitdir);
        break;
3.case SET_SPLITTING_IN_ZK:
        if (server != null && server.getZooKeeper() != null) {
          cleanZK(server, this.parent.getRegionInfo());
        }
        break;
{code}
If this.parent.initialize() throws IOException in step 1,
If check filesystem is ok. it will do nothing.
However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167760#comment-13167760 ] 

Hudson commented on HBASE-4881:
-------------------------------

Integrated in HBase-TRUNK-security #29 (See [https://builds.apache.org/job/HBase-TRUNK-security/29/])
    HBASE-4881 Unhealthy region is on service caused by rollback of region splitting

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java

                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "chunhui shen (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-4881:
--------------------------------

    Attachment: hbase-4881.patch

To ensure be safe, update the IOException to RunTimeException which is thrown by this.parent.initialize()
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>         Attachments: hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4881:
-------------------------

    Attachment: 4881-v2.txt

Here is what I applied to trunk and 0.92 branch.
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167998#comment-13167998 ] 

Hudson commented on HBASE-4881:
-------------------------------

Integrated in HBase-0.92 #184 (See [https://builds.apache.org/job/HBase-0.92/184/])
    HBASE-4881 Unhealthy region is on service caused by rollback of region splitting; overcommitted in RegionServerMetrics -- reverting change on this file
HBASE-4881 Unhealthy region is on service caused by rollback of region splitting

stack : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158216#comment-13158216 ] 

chunhui shen commented on HBASE-4881:
-------------------------------------

This case happen in our test.
It causes the regionserver can't do flushing cache, because step2 is not executed, and this.parent's state is
closing=false,closed=false,writestate.writesEnabled =fasle.
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168185#comment-13168185 ] 

Hudson commented on HBASE-4881:
-------------------------------

Integrated in HBase-0.92-security #37 (See [https://builds.apache.org/job/HBase-0.92-security/37/])
    HBASE-4881 Unhealthy region is on service caused by rollback of region splitting; overcommitted in RegionServerMetrics -- reverting change on this file
HBASE-4881 Unhealthy region is on service caused by rollback of region splitting

stack : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java

                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "stack (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-4881.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
         Assignee: chunhui shen
     Hadoop Flags: Reviewed

Committed trunk and 0.92 branch.  Thank you for the patch Chunhui.
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "chunhui shen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158254#comment-13158254 ] 

chunhui shen commented on HBASE-4881:
-------------------------------------

@Ted
sorry,I forgot to delete this line(// TODO: Verify.)
I just update the IOException to RunTimeException which is thrown by this.parent.initialize()
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>         Attachments: hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167899#comment-13167899 ] 

Hudson commented on HBASE-4881:
-------------------------------

Integrated in HBase-TRUNK #2540 (See [https://builds.apache.org/job/HBase-TRUNK/2540/])
    HBASE-4881 Unhealthy region is on service caused by rollback of region splitting

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java

                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.92.0
>
>         Attachments: 4881-v2.txt, hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4881) Unhealthy region is on service caused by rollback of region splitting

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158244#comment-13158244 ] 

Ted Yu commented on HBASE-4881:
-------------------------------

{code}
+          // TODO: Verify.
+          this.parent.initialize();
{code}
Can you tell us your plan to validate the above ?

Thanks
                
> Unhealthy region is on service caused by rollback of region splitting
> ---------------------------------------------------------------------
>
>                 Key: HBASE-4881
>                 URL: https://issues.apache.org/jira/browse/HBASE-4881
>             Project: HBase
>          Issue Type: Bug
>            Reporter: chunhui shen
>         Attachments: hbase-4881.patch
>
>
> If region splitting is failed in the state of JournalEntry.CLOSED_PARENT_REGION
> It will be rollback as the following steps:
> {code}
> 1.case CLOSED_PARENT_REGION:
>   this.parent.initialize();
>         break;
> 2.case CREATE_SPLIT_DIR:
>     	this.parent.writestate.writesEnabled = true;
>         cleanupSplitDir(fs, this.splitdir);
>         break;
> 3.case SET_SPLITTING_IN_ZK:
>         if (server != null && server.getZooKeeper() != null) {
>           cleanZK(server, this.parent.getRegionInfo());
>         }
>         break;
> {code}
> If this.parent.initialize() throws IOException in step 1,
> If check filesystem is ok. it will do nothing.
> However, the parent region is on service now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira