You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (Created) (JIRA)" <ji...@apache.org> on 2011/11/03 18:41:33 UTC

[jira] [Created] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Master dying while going to close a region can leave it in transition forever
-----------------------------------------------------------------------------

                 Key: HBASE-4739
                 URL: https://issues.apache.org/jira/browse/HBASE-4739
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.4
            Reporter: Jean-Daniel Cryans
            Priority: Minor
             Fix For: 0.92.0, 0.94.0, 0.90.5


I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.

When the master restarted it saw the znode and started printing this:

{quote}
2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
{quote}

It's never going to happen, and it's blocking balancing.

I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159778#comment-13159778 ] 

Hudson commented on HBASE-4739:
-------------------------------

Integrated in HBase-0.92 #163 (See [https://builds.apache.org/job/HBase-0.92/163/])
    HBASE-4739  Master dying while going to close a region can leave it in transition forever
               (Gao Jinchao)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4739:
--------------------------

    Status: Patch Available  (was: Open)
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149532#comment-13149532 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

Please review this patch.

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155050#comment-13155050 ] 

Ted Yu edited comment on HBASE-4739 at 11/22/11 6:05 PM:
---------------------------------------------------------

This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ?
{code}
 } catch (NotServingRegionException nsre) {
      LOG.info("Server " + server + " returned " + nsre + " for " +
        region.getEncodedName());
      // Presume that master has stale data.  Presume remote side just split.
      // Presume that the split message when it comes in will fix up the master's
      // in memory cluster state.
    }catch (Throwable t)
{code}
I think we should use the wrap of RemoteException. 
                
      was (Author: sunnygao):
    This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ?

 } catch (NotServingRegionException nsre) {
      LOG.info("Server " + server + " returned " + nsre + " for " +
        region.getEncodedName());
      // Presume that master has stale data.  Presume remote side just split.
      // Presume that the split message when it comes in will fix up the master's
      // in memory cluster state.
    }catch (Throwable t)

I think we should use the wrap of RemoteException. 
                  
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155328#comment-13155328 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

@Ted
In 0.90.5 the CLOSING node is created by RS.  Do we need to change that behaviour also? 
Are only forcefully calling unassign() will do ?
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149528#comment-13149528 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.hbase.master.TestMasterFailover
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 99.82 sec

Results :

Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154758#comment-13154758 ] 

stack commented on HBASE-4739:
------------------------------

Is this state misnamed?  Is the comment below saying that the master sets CLOSING state?

{code}
-    RS_ZK_REGION_CLOSING      (1),   // RS is in process of closing a region
+    RS_ZK_REGION_CLOSING      (1),   // Master adds this region as closing in ZK
{code}

Why not just print out state rather than do this in log message:

{code}
(state.isPendingClose() ? "pending close " : "closing ") +
{code}

I'm not up on what rest of patch does so can't comment more.

Thanks for digging in Jinchao.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154026#comment-13154026 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

Yes Gao.. thats is what i said when i meant need to handle RegionAlreadyInTransitionException .
Explicitly catch that exception in the catch block of unassign() and handle the exception by updating the timestamp of RIT. 

Good work.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156472#comment-13156472 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

Integrated to 0.92 branch and TRUNK.

Thanks for the patch Jinchao.

Thanks for the review Ramkrishna, Stack and J-D.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_Branch092.patch

Patch is in branch 0.92
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150168#comment-13150168 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

V2 fixs Ted's comment.

In my local, replication test case passed. I will try to dig this issue "https://builds.apache.org/job/PreCommit-HBASE-Build/243//testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/"

Running org.apache.hadoop.hbase.replication.TestReplicationSource
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.897 sec
Running org.apache.hadoop.hbase.replication.TestReplicationPeer
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.538 sec
Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationSourceManager
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.089 sec
Running org.apache.hadoop.hbase.replication.regionserver.TestReplicationSink
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.178 sec
Running org.apache.hadoop.hbase.replication.TestReplication
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 145.709 sec
Running org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.521 sec
Running org.apache.hadoop.hbase.replication.TestMasterReplication
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 99.819 sec
Running org.apache.hadoop.hbase.TestServerName

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156546#comment-13156546 ] 

Hudson commented on HBASE-4739:
-------------------------------

Integrated in HBase-0.92-security #11 (See [https://builds.apache.org/job/HBase-0.92-security/11/])
    HBASE-4739  Master dying while going to close a region can leave it in transition forever
               (Gao Jinchao)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143830#comment-13143830 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

Can we try giving a close call after some timeout.  
Because currently the below code in HRegionServer
{code}
  protected boolean closeRegion(HRegionInfo region, final boolean abort,
      final boolean zk) {
    if (this.regionsInTransitionInRS.containsKey(region.getEncodedNameAsBytes())) {
      LOG.warn("Received close for region we are already opening or closing; " +
          region.getEncodedName());
      return false;
    }
{code}
will prevent a close call reaching once again.  So in the case that J-D had described after a timeout if we issue a close call he should be able to close the region right?

Correct me if am wrong.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151761#comment-13151761 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504006/4739_trial2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -163 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/274//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/274//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/274//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155050#comment-13155050 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

This patch is not compatible, I added M_ZK_REGION_CLOSING and delete RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function unassign(HRegionInfo region, boolean force) ?

 } catch (NotServingRegionException nsre) {
      LOG.info("Server " + server + " returned " + nsre + " for " +
        region.getEncodedName());
      // Presume that master has stale data.  Presume remote side just split.
      // Presume that the split message when it comes in will fix up the master's
      // in memory cluster state.
    }catch (Throwable t)

I think we should use the wrap of RemoteException. 
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155048#comment-13155048 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

+1 for HBASE-4739_trial6.patch.  The state name change is to indicate that the node is created by master.  



                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149660#comment-13149660 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

Why was the check for zk node existence in actOnTimeOut() at line 2531 removed ?

Would sendRegionClose() be a better name for the new method, handleRegionClose() ?
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151197#comment-13151197 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503873/HBASE-4739_trial.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -163 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/264//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/264//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/264//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147458#comment-13147458 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

I think it's issue about RS. when RS is closing region and throws exception because RS has created a closing zk node.
we close again, may not solve the problem. If we have any Rs's logs, it is better.
If you don't want to lose data, we should close the RS and split the hlog to recover data.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_trial.patch

trail version does not test and need improve
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150339#comment-13150339 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

Jinchao:
Do you want to show us your more complicated patch ?
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155166#comment-13155166 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504734/HBASE-4739_trial6.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestInstantSchemaChange
                  org.apache.hadoop.hbase.client.TestAdmin

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/331//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/331//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/331//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152585#comment-13152585 ] 

Ted Yu edited comment on HBASE-4739 at 11/21/11 4:09 AM:
---------------------------------------------------------

@J-D
In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a CLOSING rpc.(I try to modify this patch)
In trunk, uses patch "4739_trialV3".
Hbase is used by thousands of people. If this problem occurred once, it may occur more. So I think we need to solve this issue.

What do you say J-D? 

I will do some more detailed testing about these patches and give my test cases.


                
      was (Author: sunnygao):
    @J-D
In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a CLOSING rpc.(I try to modify this patch)
In trunk, uses patch "4739_trialV3".
Hbase thousands of people in the use of, If we once, may appear more. So I think we need slove this isse.

What do you say J-D? 

I will do some more detailed testing about these patches and give my test cases.


                  
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151998#comment-13151998 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504045/4739_trialV3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -163 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/280//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/280//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/280//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155309#comment-13155309 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

All the test failures were caused by 'Too many open files'.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150290#comment-13150290 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

In fact, I made another patch that is more complicated and is not compatible, so I didn't commit.
another patch step:
1. Hmaster creater a zk node(M_ZK_REGION_PENDING_CLOSE) and set RIT to pendingclose state
2. send rpc to RS
3. RS change zk node to RS_ZK_REGION_CLOSING
4. Master changes RIT to closing

if above steps, We can distinguish the state of RIT. if M_ZK_REGION_PENDING_CLOSE we can send a rpc. RS_ZK_REGION_CLOSING we can add to RIT.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Jean-Daniel Cryans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156201#comment-13156201 ] 

Jean-Daniel Cryans commented on HBASE-4739:
-------------------------------------------

bq. Do we need make a patch for 0.90.5 ? 

Like you said earlier:

bq. In 0.90 version, I think there is no this scenario, The closing zk node is only created by RS.

So we should be fine without it.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150265#comment-13150265 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

@Gao
I have 1 thing to say
->Why don't we issue the close call to RS in timeout monitor for CLOSING rather than directly doing it in processRegionInTransition.  This will give us sometime to really see if the RS has not got the call from Master or the RS was really slow in processing the close call?
What do you say Gao? 
@Ted
Pls suggest.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_trail5.patch
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151712#comment-13151712 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

trail2 fixed your comment. if you and Ram make sense, I will test it.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155320#comment-13155320 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

Nice work, Jinchao.

{code}
    //RS_ZK_REGION_CLOSING      (1),   // Master adds this region as closing in ZK
{code}
We should say that this event is replaced by M_ZK_REGION_CLOSING.
{code}
        // RS is already processing this region, only need update the timestamp
        if(t instanceof RegionAlreadyInTransitionException){          
{code}
I think we should place else in front of if above. The comment should read 'need to update'
Also leave space between if and (, between ) and {.

Please also prepare patch for 0.90.5
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_Trunk_V2.patch
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147515#comment-13147515 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

In 0.90 version, I think there is no this scenario, The closing zk node is only created by RS.
look this code:
 public void process() {
      int expectedVersion = FAILED;
      if (this.zk) {
        expectedVersion = setClosingState();
        if (expectedVersion == FAILED) return;
      }

But in 0.92/trunk version, The problem looks like refactoring code,
I think should be:1
1.master create a pending cose state flag
2.RS receives the close call and change zk node to cosing state.
3.When the master restarted , we should handle two state.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: 4739_trial2.patch
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154001#comment-13154001 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

This is similar to the initial version you had given.  
+1 with the change.  As i said any way RS will not allow the transition from happening again if it was already processing it.  

But we need to confirm like if we need to handle RegionAlreadyInTransitionException exception.  It may be needed.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148272#comment-13148272 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

@Jinchao:
Do you want to work out a patch as you outlined @ 10/Nov/11 06:28 ?
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150277#comment-13150277 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

@Gao
if you see my comment 04/Nov/11 08:40, we have a check any way in RS to avoid the close call reaching it if RS is already processing it.
But i felt some timeout may be needed instead of immediately giving an RPC call.

Also the case that we are addressing is also a rare one, so better to go with timeout monitor.  
Lets see what Ted has to say
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150227#comment-13150227 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503708/HBASE-4739_Trunk_V2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -163 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 51 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestDistributedLogSplitting
                  org.apache.hadoop.hbase.io.hfile.TestHFileBlock

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/253//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/253//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/253//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156500#comment-13156500 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504961/HBASE-4739_Branch092.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/360//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Jean-Daniel Cryans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143371#comment-13143371 ] 

Jean-Daniel Cryans commented on HBASE-4739:
-------------------------------------------

Shutting down the RS fixes the issue.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155330#comment-13155330 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

We will keep RS_ZK_REGION_CLOSING event for 0.90.x
invokeUnassign() should still be called.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: 4739_trialV3.patch

trialV3 fixed Ram' comment. 
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151455#comment-13151455 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

Trial patch makes sense.

For unassign, the following change to javadoc is inaccurate:
{code}
   * Updates the RegionState and creates a zk node.
{code}
We still send close RPC, right ?

For sendRegionClose():
{code}
   * sends the CLOSE RPC to a region server.
   * @param region server to be unassigned
{code}
Should read "region server which receives CLOSE RPC"

For createNodePendingClose(), please replace CLOSING with PEND_CLOSE (not PEND_CLOSING).
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156628#comment-13156628 ] 

Hudson commented on HBASE-4739:
-------------------------------

Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/])
    HBASE-4739  Master dying while going to close a region can leave it in transition
               forever (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_trial6.patch

Thanks for your review. Fixed all comments

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4739:
--------------------------

    Fix Version/s:     (was: 0.90.5)
     Hadoop Flags: Reviewed
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150325#comment-13150325 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

I made a issue for 2: 
https://issues.apache.org/jira/browse/HBASE-4790
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147495#comment-13147495 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

In this scenario what JD has told is there was no close call at all that reached the RS.
What you say is valid Gao.  But currently if we see closing state we just leave it to the system to take care and we dont take any action.  Atleast that we can change correct ?

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154010#comment-13154010 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

I think we don't need handle RegionAlreadyInTransitionException exception. We only need update the timestamp of RIT,we have done.
my reason is :
1. The moniter timeout is 30 minutes, There are enough time to close a region.
2. if the RS throws RegionAlreadyInTransitionException exception, we need update the timestamp of RIT and wait next timeout.

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148300#comment-13148300 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

Yes, I am trying to reproduce this issue in trunk and make a patch.

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152021#comment-13152021 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

Not completed the test ,tomorrow continue!
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao reassigned HBASE-4739:
---------------------------------

    Assignee: gaojinchao
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149727#comment-13149727 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503590/HBASE-4739_Trunk.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -163 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 50 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/243//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/243//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/243//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153969#comment-13153969 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

HBASE-4739_trail5 made a few changes, Please review, if it makes sense, I will verify in a real cluster.

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150142#comment-13150142 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

Why was the check for zk node existence in actOnTimeOut() at line 2531 removed ?

This code snippets is used in branch 0.90, because the zk node "RS_ZK_REGION_CLOSING" is created by RS,
But in trunk the zk node "RS_ZK_REGION_CLOSING" is created by master. So this conditin should be impossible.

 

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155689#comment-13155689 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504843/HBASE-4739_V7.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -162 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestAdmin
                  org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/343//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/343//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/343//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150188#comment-13150188 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

+1 if tests pass.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150267#comment-13150267 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

@Ram
That is a good suggestion, But timeout monitor needs 30 minutes. should we wait so long time ?
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Jean-Daniel Cryans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152434#comment-13152434 ] 

Jean-Daniel Cryans commented on HBASE-4739:
-------------------------------------------

I'm really uneasy with adding a new znode, that part of the master code is really finicky. At a minimum this should not be included in 0.92 since this situation is pretty rare (IMO).
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Jean-Daniel Cryans (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154749#comment-13154749 ] 

Jean-Daniel Cryans commented on HBASE-4739:
-------------------------------------------

I prefer the latest patch, although it seems the EventHandler.java bit was only needed with the new znode.

Handling RegionAlreadyInTransitionException would be cleaner.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147543#comment-13147543 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

It seems HBASE-3789 refactored the code
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155600#comment-13155600 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

@Ted
It seems 0.90.5 logic is ok. 
1. if RS_ZK_REGION_CLOSING is created, It says that RS has received the RPC
2. When RIT is timeout, There is two case, one RS is slow, in this case we don't need send RPC again.
   another case, closing the region has exception, we send rpc can't solve the problem, it may also fail.
So I think we don't need fix anything.

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4739:
-------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.92.1)
                   0.92.0
           Status: Resolved  (was: Patch Available)

This was applied a while back.  Will be in 0.92.0
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151923#comment-13151923 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

Only sends a RPC, I think state machine lose a state flag. Master doesn't distinguish pending or closing. Code structure is not well. But this patch is not compatible.

Fixed other comments
 

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_V7.patch

Fixed Ted's comment
@Ted
Do we need make a patch for 0.90.5 ? :)
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151831#comment-13151831 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
-----------------------------------------------

@Gao
Good work... One suggestion
For the case 
M_ZK_REGION_PENDING_CLOSE under processRegionsInTransition() i think
{code}
if (isOnDeadServer(regionInfo, deadServers) &&
            (data.getOrigin() == null || !serverManager.isServerOnline(data.getOrigin()))) {
          // If was on dead server, its closed now. Force to OFFLINE and this
          // will get it reassigned if appropriate
          forceOffline(regionInfo, data);
{code}
is needed as we do in RS_ZK_CLOSING

Because assume a case where master creates the node in PENDING_CLOSE but before sending the call to RS master goes down.  Before the master comes up RS is also down.  Now as per the current patch in case of processRegionInTransition() for M_ZK_REGION_PENDING_CLOSE we will once again try to unassign() which will not happen. so better to force it Offline.  What do you think Gao? 
I still feel we can give just a RPC call to RS instead of handling a new state because in RS any way close call cannot reach it twice.  
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4739:
--------------------------

    Status: Patch Available  (was: Open)
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4739:
------------------------------

    Attachment: HBASE-4739_Trunk.patch
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151765#comment-13151765 ] 

Ted Yu commented on HBASE-4739:
-------------------------------

The failed tests don't seem to be related to the changes in this patch.

@Jinchao:
Please perform more validation on rela cluster.

Good work.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Ted Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4739:
--------------------------

    Status: Open  (was: Patch Available)
    
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152585#comment-13152585 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

@J-D
In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a CLOSING rpc.(I try to modify this patch)
In trunk, uses patch "4739_trialV3".
Hbase thousands of people in the use of, If we once, may appear more. So I think we need slove this isse.

What do you say J-D? 

I will do some more detailed testing about these patches and give my test cases.


                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "gaojinchao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150310#comment-13150310 ] 

gaojinchao commented on HBASE-4739:
-----------------------------------

about test failed:
1.org.apache.hadoop.hbase.io.hfile.TestHFileBlock

  Look this comments
  public void testBlockHeapSize() {
    // We have seen multiple possible values for this estimate of the heap size
    // of a ByteBuffer, presumably depending on the JDK version.
    assertTrue(HFileBlock.BYTE_BUFFER_HEAP_SIZE == 64 ||
               HFileBlock.BYTE_BUFFER_HEAP_SIZE == 80); But in https://issues.apache.org/jira/browse/HBASE-4768

We add some code snippets：
      assertEquals(80, HFileBlock.BYTE_BUFFER_HEAP_SIZE);
      long byteBufferExpectedSize =
          ClassSize.align(ClassSize.estimateBase(buf.getClass(), true)
              + HFileBlock.HEADER_SIZE + size);



2.org.apache.hadoop.hbase.master.TestDistributedLogSplitting
Because we choose a rs with 0 regions.

// it said that regions is 0.
2011-11-15 03:53:11,215 INFO  [Thread-2335] master.TestDistributedLogSplitting(211): #regions = 0
2011-11-15 03:53:11,215 DEBUG [RegionServer:0;asf001.sp2.ygridcore.net,36721,1321329179789.logSyncer] wal.HLog$LogSyncer(1192): RegionServer:0;asf001.sp2.ygridcore.net,36721,1321329179789.logSyncer interrupted while waiting for sync requests
2011-11-15 03:53:11,215 INFO  [RegionServer:0;asf001.sp2.ygridcore.net,36721,1321329179789.logSyncer] wal.HLog$LogSyncer(1194): RegionServer:0;asf001.sp2.ygridcore.net,36721,1321329179789.logSyncer exiting
2011-11-15 03:53:11,215 DEBUG [Thread-2335] wal.HLog(967): closing hlog writer in hdfs://localhost:46229/user/jenkins/.logs/asf001.sp2.ygridcore.net,36721,1321329179789
2011-11-15 03:53:11,637 DEBUG [Thread-2335] master.SplitLogManager(233): Scheduling batch of logs to split

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153999#comment-13153999 ] 

Hadoop QA commented on HBASE-4739:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504464/HBASE-4739_trail5.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -166 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 60 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestAdmin

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/317//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/317//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/317//console

This message is automatically generated.
                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0, 0.90.5
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156503#comment-13156503 ] 

Hudson commented on HBASE-4739:
-------------------------------

Integrated in HBase-TRUNK #2476 (See [https://builds.apache.org/job/HBase-TRUNK/2476/])
    HBASE-4739  Master dying while going to close a region can leave it in transition
               forever (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java

                
> Master dying while going to close a region can leave it in transition forever
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4739
>                 URL: https://issues.apache.org/jira/browse/HBASE-4739
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: gaojinchao
>            Priority: Minor
>             Fix For: 0.92.0, 0.94.0
>
>         Attachments: 4739_trial2.patch, 4739_trialV3.patch, HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when the master died it had just created the RIT znode for a region but didn't tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for too long, this should eventually complete or the server will expire, doing nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira