You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2010/06/08 02:44:12 UTC

[jira] Created: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

LeaseStillHeldException totally ignored by RS, wrongly named
------------------------------------------------------------

                 Key: HBASE-2691
                 URL: https://issues.apache.org/jira/browse/HBASE-2691
             Project: HBase
          Issue Type: Bug
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
             Fix For: 0.20.6, 0.21.0


Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:

{code}

2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
        at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
        at java.lang.Thread.run(Thread.java:637)
{code}

Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-2691.
---------------------------------------

     Hadoop Flags: [Reviewed]
    Fix Version/s:     (was: 0.20.6)
       Resolution: Fixed

Committed to trunk but not to branch, I think it was a bit invasive.

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2691.patch
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876859#action_12876859 ] 

HBase Review Board commented on HBASE-2691:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/164/#review160
-----------------------------------------------------------

Ship it!


+1 after making the mostly javadoc changes suggested below.


/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
<http://review.hbase.org/r/164/#comment793>

    Javadoc missing here... you need to explain the bit about how you can send a exception back to RS



/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
<http://review.hbase.org/r/164/#comment794>

    Fatten the message passed back to the regionserver.



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/164/#comment795>

    Not needed (after chatting w/ j-d)



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
<http://review.hbase.org/r/164/#comment796>

    Same here



/trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java
<http://review.hbase.org/r/164/#comment797>

    Remove



/trunk/src/test/java/org/apache/hadoop/hbase/TestYouAreDead.java
<http://review.hbase.org/r/164/#comment799>

    These single test classes are killing us...regards overall unit tests times... can we not put it into an existing test class....?


- stack





> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876737#action_12876737 ] 

Jean-Daniel Cryans commented on HBASE-2691:
-------------------------------------------

This would be the second part of the fix, as referred to in the title of this jira, that using Leases.LeaseStillHeldException is wrong. Before ZK, we were using Leases for the region server registrations in the master but now we only use its exception. Instead, I think we could have finer grained and more relevant exceptions. What about ServerAlreadyExistingException and ServerAlreadyConsideredDeadException? (I'm not good at naming stuff)

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2691:
--------------------------------------

    Attachment: HBASE-2691.patch

Patch that will be committed.

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>         Attachments: HBASE-2691.patch
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876851#action_12876851 ] 

HBase Review Board commented on HBASE-2691:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jd...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/164/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

Patch that first adds a new testing facility to send exceptions to region servers when they report in. 
Then I added YouAreDeadException and PleaseHoldException that replaces the ambiguous lease still held exception, and both are treated differently.
Finally, I added a very simple test that makes sure that telling a RS that its dead will actually end up in killing it.


This addresses bug HBASE-2691.


Diffs
-----

  /trunk/src/main/java/org/apache/hadoop/hbase/PleaseHoldException.java PRE-CREATION 
  /trunk/src/main/java/org/apache/hadoop/hbase/YouAreDeadException.java PRE-CREATION 
  /trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 952836 
  /trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 952836 
  /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 952836 
  /trunk/src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java 952836 
  /trunk/src/test/java/org/apache/hadoop/hbase/TestYouAreDead.java PRE-CREATION 

Diff: http://review.hbase.org/r/164/diff


Testing
-------


Thanks,

Jean-Daniel




> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876553#action_12876553 ] 

stack commented on HBASE-2691:
------------------------------

How you going to tell difference between LeaseStillHeldException thrown when we're processing shutdown of a RS that was on the same host and port as this RS? (The scenario is the RS fails and is restarted quickly, so fast, it checks in at master before master even knows it dead).

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876538#action_12876538 ] 

stack commented on HBASE-2691:
------------------------------

On a reportForDuty, we have code that will reject HRS with lease still held BUT it'll tickle the expire-of-the-region shutdown processing.  The RS will be continually rejected until soon after the shutdown processing has gotten past its initial steps.  Then the RS is let in.

Where are you when this has happened?  Just started?  What session has expired?  The RS in ZK?

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876738#action_12876738 ] 

stack commented on HBASE-2691:
------------------------------

.bq What about ServerAlreadyExistingException and ServerAlreadyConsideredDeadException? (I'm not good at naming stuff)

Doing above and purging LeaseStillHeldException as you suggest is a good idea.  It solves differentiating the different startup/dead-server circumstances.

Regards naming, they ain't too bad.  The latter could be YouAreDeadException (with its message holding info on why its considered dead).  The former could be PleaseHoldException (its message would be why the holdup).

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876542#action_12876542 ] 

Jean-Daniel Cryans commented on HBASE-2691:
-------------------------------------------

The RS's session's expired, it reports back to the master right after that (it's marked dead in the master) and trips into:

{code}

  private void checkIsDead(final String serverName, final String what)
  throws LeaseStillHeldException {
    if (!isDead(serverName)) return;
    LOG.debug("Server " + what + " rejected; currently processing " +
      serverName + " as dead server");
    throw new Leases.LeaseStillHeldException(serverName);
  }
{code}

Which I see in the log. then on the HRS side this falls into:

{code}

          } catch (Exception e) { // FindBugs REC_CATCH_EXCEPTION
            if (e instanceof IOException) {
              e = RemoteExceptionHandler.checkIOException((IOException) e);
            }
            tries++;
            if (tries > 0 && (tries % this.numRetries) == 0) {
              // Check filesystem every so often.
              checkFileSystem();
            }
            if (this.stopRequested.get()) {
              LOG.info("Stop requested, clearing toDo despite exception");
              toDo.clear();
              continue;
            }
              LOG.warn("Attempt=" + tries, e);
            // No point retrying immediately; this is probably connection to
            // master issue.  Doing below will cause us to sleep.
            lastMsg = System.currentTimeMillis();
{code}

Which throws the stack trace I pasted in this jira's description. IMO, and taking into account the last comment in that code, we shouldn't retry. Instead, we should catch LeaseStillHeldException separately from this big catch(Exception) and treat it as an emergency shut down.

> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2691) LeaseStillHeldException totally ignored by RS, wrongly named

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876877#action_12876877 ] 

HBase Review Board commented on HBASE-2691:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jd...@apache.org>


bq.  On 2010-06-08 16:18:09, stack wrote:
bq.  > /trunk/src/test/java/org/apache/hadoop/hbase/TestYouAreDead.java, line 27
bq.  > <http://review.hbase.org/r/164/diff/1/?file=1103#file1103line27>
bq.  >
bq.  >     These single test classes are killing us...regards overall unit tests times... can we not put it into an existing test class....?

Will be all merged into TestMasterWrongRS, that I rename to TestKillingServersFromMaster


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/164/#review160
-----------------------------------------------------------





> LeaseStillHeldException totally ignored by RS, wrongly named
> ------------------------------------------------------------
>
>                 Key: HBASE-2691
>                 URL: https://issues.apache.org/jira/browse/HBASE-2691
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.6, 0.21.0
>
>
> Currently region servers don't handle org.apache.hadoop.hbase.Leases$LeaseStillHeldException in any way that's useful so what happens right now is that it tries to report to the master and this happens:
> {code}
> 2010-06-07 17:20:54,368 WARN  [RegionServer:0] regionserver.HRegionServer(553): Attempt=1
> org.apache.hadoop.hbase.Leases$LeaseStillHeldException
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:541)
>         at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:173)
>         at java.lang.Thread.run(Thread.java:637)
> {code}
> Then it will retry until the watch is triggered telling it that the session's expired! Instead, we should be a lot more proactive initiate abort procedure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.