You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Amandeep Khurana (JIRA)" <ji...@apache.org> on 2009/07/08 22:25:14 UTC

[jira] Created: (HBASE-1629) HRS unable to contact master

HRS unable to contact master
----------------------------

                 Key: HBASE-1629
                 URL: https://issues.apache.org/jira/browse/HBASE-1629
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Amandeep Khurana
             Fix For: 0.20.0


HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:

2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization

More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1629) HRS unable to contact master

Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amandeep Khurana updated HBASE-1629:
------------------------------------

    Attachment: Master_log
                RS_Log

Attaching logs from the RS and the Master from the time the failure started

> HRS unable to contact master
> ----------------------------
>
>                 Key: HBASE-1629
>                 URL: https://issues.apache.org/jira/browse/HBASE-1629
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0
>
>         Attachments: Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1629) HRS unable to contact master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe reassigned HBASE-1629:
----------------------------------

    Assignee: Nitay Joffe

> HRS unable to contact master
> ----------------------------
>
>                 Key: HBASE-1629
>                 URL: https://issues.apache.org/jira/browse/HBASE-1629
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Amandeep Khurana
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1629) HRS unable to contact master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1629:
-------------------------------

    Attachment: hbase-1629.patch

Small patch for a convoluted problem. Amandeep, try this out, see if it fixes it for you.

Here's the problem:

{noformat}
[14:32]  <nitay> reportForDuty()
[14:32]  <nitay>     while (!getMaster()) {
[14:32]  <nitay>       sleeper.sleep();
[14:32]  <nitay>       LOG.warn("Unable to get master for initialization");
[14:32]  <nitay>     }
[14:33]  <nitay> getMaster()
[14:33]  <nitay>     HServerAddress masterAddress = null;
[14:33]  <nitay>     while (masterAddress == null) {
[14:33]  <nitay>       if (stopRequested.get()) {
[14:33]  <nitay>         return false;
[14:33]  <nitay>       }
{noformat}

This is an infinite loop which causes the messages at the end of the RS Log Amandeep posted.

The flow of logic that leads to this is the following:
# RS session with ZooKeeper expires.
# Master gets znode expiration, starts cleanup/shutdown of RS.
# RS gets its session expired, begins restart() logic, setting stopRequested.
# Meanwhile, RS run() thread is still talking to master.
# Master gets a message from RS, but doesn't know it because it's been removed. This is the "received server report from unknown server..." stuff. Tells the RS to reinitialize, sending MSG_CALL_SERVER_STARTUP.
# RS on getting MSG_CALL_SERVER_STARTUP calls reportForDuty() and is now in a loop. The restart() thread from ZooKeeper is waiting for the RS run() to finish, but it never will.

This simple patch makes reportyForDuty() fail fast when stopRequested is set.

> HRS unable to contact master
> ----------------------------
>
>                 Key: HBASE-1629
>                 URL: https://issues.apache.org/jira/browse/HBASE-1629
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Amandeep Khurana
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1629) HRS unable to contact master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729065#action_12729065 ] 

stack commented on HBASE-1629:
------------------------------

+1

Would be great if AK could try it before commit but I'm fine w/ commit if not -- easy enough see if this fixes issue.

> HRS unable to contact master
> ----------------------------
>
>                 Key: HBASE-1629
>                 URL: https://issues.apache.org/jira/browse/HBASE-1629
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Amandeep Khurana
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1629) HRS unable to contact master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe resolved HBASE-1629.
--------------------------------

    Resolution: Fixed

Stack said to go ahead and commit this. Amandeep, if you see this again, feel free to reopen the issue.

> HRS unable to contact master
> ----------------------------
>
>                 Key: HBASE-1629
>                 URL: https://issues.apache.org/jira/browse/HBASE-1629
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Amandeep Khurana
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.