You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack (JIRA)" <ji...@apache.org> on 2007/06/25 21:53:25 UTC

[jira] Created: (HADOOP-1527) Region server won't start because logdir exists

Region server won't start because logdir exists
-----------------------------------------------

                 Key: HADOOP-1527
                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
             Project: Hadoop
          Issue Type: Bug
          Components: contrib/hbase
            Reporter: stack
            Assignee: stack


Starting and then impolitely stopping a cluster I came across the following:

2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
        at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
        at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
        at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)

Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

    Attachment: HADOOP-1527-patch.txt

Revise for recent commits

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt, HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521516 ] 

Jim Kellerman commented on HADOOP-1527:
---------------------------------------

This really is an abnormal condition, because if a region server dies, the master should split the region server's log (and place the records in the regions' directory(ies)) and then remove the region server log.

If a region server is starting up and discovers a log directory exists which should belong exclusively to that server that means that either:
- the master has not cleaned up the log yet (or perhaps never will if the master crashed before it could)
- another region server started and grabbed that port, so the starting region server should shut down.

In the former case, if the master crashed, we should provide a tool that can split the log so we can recover the regions that the previous region server instance was serving.

Otherwise I think that what is happening is the correct behavior.


> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

    Status: Patch Available  (was: Open)

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt, HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

        Fix Version/s: 0.15.0
    Affects Version/s: 0.15.0
               Status: Patch Available  (was: Open)

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521525 ] 

Jim Kellerman commented on HADOOP-1527:
---------------------------------------

Even better, if the master discovers a stale entry in the root or meta regions, it should go look to see if the log file exists an split it before assigning the region to a new server. 

This would even handle the case where the region server serving the root region died because it is highly unlikely that a region server would have only been serving the root region.

So the plan of attack is to add a check in the master upon discovery of a stale entry in the root and meta regions, and to create a separate utility to recover a region server log in the unlikely event that a region server was only serving the root region.


> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

    Attachment: HADOOP-1527-patch.txt

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HADOOP-1527:
-------------------------------------

    Assignee: Jim Kellerman  (was: stack)

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: Jim Kellerman
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

    Status: Open  (was: Patch Available)

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521709 ] 

Hadoop QA commented on HADOOP-1527:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12364281/HADOOP-1527-patch.txt applied and successfully tested against trunk revision r568404.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/594/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/594/console

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt, HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1527) Region server won't start because logdir exists

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HADOOP-1527:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Passes tests. Committed.

> Region server won't start because logdir exists
> -----------------------------------------------
>
>                 Key: HADOOP-1527
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1527
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1527-patch.txt, HADOOP-1527-patch.txt
>
>
> Starting and then impolitely stopping a cluster I came across the following:
> 2007-06-25 19:43:31,449 ERROR org.apache.hadoop.hbase.HRegionServer: Can not start region server because org.apache.hadoop.hbase.RegionServerRunningException: region server already running at 208.76.44.140:60010 because logdir  exists
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:447)
>         at org.apache.hadoop.hbase.HRegionServer.<init>(HRegionServer.java:372)
>         at org.apache.hadoop.hbase.HRegionServer.main(HRegionServer.java:1233)
> Region server should recover or offer a recovery path when we run into this condition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.