You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2010/11/29 19:11:13 UTC

[jira] Created: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
-----------------------------------------------------------------------------------------------------

                 Key: HBASE-3282
                 URL: https://issues.apache.org/jira/browse/HBASE-3282
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.0
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
             Fix For: 0.90.0, 0.92.0


Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.

We should hang on to the dead servers so we always reject them.

One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-3282:
---------------------------------

    Attachment: HBASE-3282-v4.patch

Unit tests passed.  Committing this final version of the patch.

> Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>         Attachments: HBASE-3282-v4.patch
>
>
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964905#action_12964905 ] 

HBase Review Board commented on HBASE-3282:
-------------------------------------------

Message from: "Jonathan Gray" <jg...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1259/
-----------------------------------------------------------

(Updated 2010-11-29 11:43:07.682958)


Review request for hbase and stack.


Changes
-------

Makes DeadServers private.  It was still accessed by my TestRollingRestart test so I had to make a small change to how that worked.

Also added some additional testing in TestDeadServers that verifies the new boolean check works as expected and also that the max capacity works as expected.


Summary
-------

We currently let go of dead servers once we finish their shutdown.  We should hang on to them longer to deal with things like network partitions.

I'm not a fan of SoftReferences so I decided on another approach.  DeadServers now has a maximum number of servers to hold on to in the set (default 100).  Once it reaches the max, it evicts the oldest.

More code than I had hoped but nothing too crazy.


This addresses bug HBASE-3282.
    http://issues.apache.org/jira/browse/HBASE-3282


Diffs (updated)
-----

  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java 1040242 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1040242 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1040242 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1040242 
  branches/0.90/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java 1040242 
  branches/0.90/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java 1040242 

Diff: http://review.cloudera.org/r/1259/diff


Testing
-------

Running unit tests now.


Thanks,

Jonathan




> Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964879#action_12964879 ] 

HBase Review Board commented on HBASE-3282:
-------------------------------------------

Message from: "Jonathan Gray" <jg...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1259/
-----------------------------------------------------------

Review request for hbase and stack.


Summary
-------

We currently let go of dead servers once we finish their shutdown.  We should hang on to them longer to deal with things like network partitions.

I'm not a fan of SoftReferences so I decided on another approach.  DeadServers now has a maximum number of servers to hold on to in the set (default 100).  Once it reaches the max, it evicts the oldest.

More code than I had hoped but nothing too crazy.


This addresses bug HBASE-3282.
    http://issues.apache.org/jira/browse/HBASE-3282


Diffs
-----

  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java 1040221 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1040221 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1040221 
  branches/0.90/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 1040221 

Diff: http://review.cloudera.org/r/1259/diff


Testing
-------

Running unit tests now.


Thanks,

Jonathan




> Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964891#action_12964891 ] 

HBase Review Board commented on HBASE-3282:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1259/#review2004
-----------------------------------------------------------

Ship it!



branches/0.90/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
<http://review.cloudera.org/r/1259/#comment6320>

    You can make this private now that its no longer referenced by Master?


- stack





> Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-3282) Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray resolved HBASE-3282.
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed to branch and trunk after verifying unit tests pass.  Thanks for review stack.

> Need to retain DeadServers to ensure we don't allow previously expired RS instances to rejoin cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3282
>                 URL: https://issues.apache.org/jira/browse/HBASE-3282
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.90.0, 0.92.0
>
>         Attachments: HBASE-3282-v4.patch
>
>
> Currently we clear a server from the deadserver set once we finish processing it's shutdown.  However, certain circumstances (network partitions, race conditions) could lead to the RS not doing a check-in until after the shutdown has been processed.  As-is, this RS will now be let back in to the cluster rather than rejected with YouAreDeadException.
> We should hang on to the dead servers so we always reject them.
> One concern is that the set will grow indefinitely.  One recommendation by stack is to use SoftReferences.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.