You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/09/27 23:48:50 UTC

[jira] Created: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

[hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
-------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1960
                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
             Project: Hadoop
          Issue Type: Improvement
          Components: contrib/hbase
    Affects Versions: 0.15.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
             Fix For: 0.15.0


If a region server cannot contact the master after a configurable number of tries, it should shut itself down.

If the region server cannot contact the master,
- if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
- if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531678 ] 

Hadoop QA commented on HADOOP-1960:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12366871/patch.txt
against trunk revision r581101.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/861/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/861/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/861/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/861/console

This message is automatically generated.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-1960 started by Jim Kellerman.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Status: Patch Available  (was: In Progress)

Works locally. Try Hudson.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Tests passed. Committed.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt, patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532076 ] 

Hudson commented on HADOOP-1960:
--------------------------------

Integrated in Hadoop-Nightly #259 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/259/])

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt, patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Attachment: patch.txt

This patch removes the new unit test for testing region server shutdown because the change to HMaster (adding an abort() method), is too dangerous to leave enabled. Without the test, changes to HMaster and MiniHBase cluster are no longer needed.

Changes included in this patch are:

TestRegionServerAbort

- Add check for scanner != null before trying to close it

TestSplit

- Enclose test body in try catch block so that exceptions can be dumped to the console at the point in the test where they occur.

HRegionServer

- If unable to communicate with the master for more than the lease timeout interval abort server.



> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt, patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Status: Patch Available  (was: Open)

Works locally. Try Hudson.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt, patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Attachment: patch.txt

TestMasterAbort
- New test

MiniHBaseCluster
- Add getter that returns the HMaster object

TestRegionServerAbort
- Add check for scanner == null before trying to close it

TestSplit
- Enclose test body in try catch block so that exceptions can be
  dumped to the console at the point in the test where they occur.

HRegionServer
- If unable to communicate with the master for more than the lease
  timeout interval abort server.

HMaster
- Add abort method
- If aborting,  ignores region server reports for 1 1/2 times lease period


> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531920 ] 

Hadoop QA commented on HADOOP-1960:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12366945/patch.txt
against trunk revision r581345.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/869/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/869/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/869/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/869/console

This message is automatically generated.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt, patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1960) [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1960:
----------------------------------

    Status: Open  (was: Patch Available)

Although the test for the region server shutting down succeeded, another test failed.

> [hbase] If a region server cannot talk to the master after several attempts, it should shut itself down
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: patch.txt
>
>
> If a region server cannot contact the master after a configurable number of tries, it should shut itself down.
> If the region server cannot contact the master,
> - if the master is alive but the network is partitioned, the master will probably time out the region server's lease and try to recover the server's log and reassign the regions the server is serving.
> - if the master has died, and subsequently restarts, it will be reassigning regions anyway, so the region server should stop serving the regions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.