You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/09/23 22:28:50 UTC

[jira] Created: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

[hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
--------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1937
                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
    Affects Versions: 0.15.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.15.0


When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 

There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529977 ] 

Jim Kellerman commented on HADOOP-1937:
---------------------------------------

Implementation strategy:

When a server's lease expires, remove the server from serversToServerInfo, put the serverInfo into a new Map, serversInJeopardy and put a PendingServerShutdown into a DelayQueue with an expiration of 1/2 of a server lease timeout.

If the server reports in in that period, the PendingServerShutdown is removed from the DelayQueue and the server is "reinstated" by removing it from the serversInJeopardy Map and putting it back in the serversToServerInfo map.

If the server does not report in, it is removed from the serversInJeopardy Map and the PendingServerShutdown is processed.

> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1937:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

This issue is one for which it is nearly impossible to build a regression test (it is very similar to TestDFSAbort in this regard). Consequently no regression test has been included with this patch. It has been tested extensively in the development environment, however and introduces no new regressions in the other tests. Committed.

> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532624 ] 

Hudson commented on HADOOP-1937:
--------------------------------

Integrated in Hadoop-Nightly #261 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/261/])

> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1937:
----------------------------------

    Attachment: patch.txt

HRegionServer
- Fix restart

HMaster
- Introduce DelayQueue for processing dead region servers


> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532575 ] 

Hadoop QA commented on HADOOP-1937:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12367125/patch.txt
against trunk revision r582066.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/891/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/891/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/891/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/891/console

This message is automatically generated.

> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1937:
----------------------------------

    Status: Patch Available  (was: Open)

> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1937) [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531933 ] 

Jim Kellerman commented on HADOOP-1937:
---------------------------------------

Revised strategy:

With HADOOP-1960, if the region server cannot talk to the master
before its lease expires it shuts itself down. Thus the likelihood of
a region server checking in after its lease has expired is low. In the
event this does happen, however, the master will tell the region
server to restart; that is close all open regions and flush its log.

However, the master should defer processing the server's log and
reassigning its regions as the server may still be in the process of
shutting down. Consequently, all PendingServerShutdowns will be placed
in a delay queue for 1/2 a lease period to ensure the region server
has shut down.

Finally, we will add the server start code to the log file name, so
that if the region server restarts before the master processes the old
log file, the new log file will not be included.


> [hbase] when the master times out a region server's lease, it is too aggressive in reclaiming the server's log
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1937
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1937
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>
> When a region server's lease times out, the master immediately begins trying to split the server's log file. There have been cases where a region server was just a little late reporting to the master and the master had already started trying to reclaim the server's log, even though the server was still writing to it. 
> There needs to be some kind of "grace period" in which, if the region server reports in, the master re-instates the server. If the "grace period" expires, then the master should start processing the server's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.