You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Amandeep Khurana (JIRA)" <ji...@apache.org> on 2009/07/08 22:25:14 UTC
[jira] Created: (HBASE-1629) HRS unable to contact master
HRS unable to contact master
----------------------------
Key: HBASE-1629
URL: https://issues.apache.org/jira/browse/HBASE-1629
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: Amandeep Khurana
Fix For: 0.20.0
HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1629) HRS unable to contact master
Posted by "Amandeep Khurana (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amandeep Khurana updated HBASE-1629:
------------------------------------
Attachment: Master_log
RS_Log
Attaching logs from the RS and the Master from the time the failure started
> HRS unable to contact master
> ----------------------------
>
> Key: HBASE-1629
> URL: https://issues.apache.org/jira/browse/HBASE-1629
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Amandeep Khurana
> Fix For: 0.20.0
>
> Attachments: Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1629) HRS unable to contact master
Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nitay Joffe reassigned HBASE-1629:
----------------------------------
Assignee: Nitay Joffe
> HRS unable to contact master
> ----------------------------
>
> Key: HBASE-1629
> URL: https://issues.apache.org/jira/browse/HBASE-1629
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Amandeep Khurana
> Assignee: Nitay Joffe
> Fix For: 0.20.0
>
> Attachments: Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1629) HRS unable to contact master
Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nitay Joffe updated HBASE-1629:
-------------------------------
Attachment: hbase-1629.patch
Small patch for a convoluted problem. Amandeep, try this out, see if it fixes it for you.
Here's the problem:
{noformat}
[14:32] <nitay> reportForDuty()
[14:32] <nitay> while (!getMaster()) {
[14:32] <nitay> sleeper.sleep();
[14:32] <nitay> LOG.warn("Unable to get master for initialization");
[14:32] <nitay> }
[14:33] <nitay> getMaster()
[14:33] <nitay> HServerAddress masterAddress = null;
[14:33] <nitay> while (masterAddress == null) {
[14:33] <nitay> if (stopRequested.get()) {
[14:33] <nitay> return false;
[14:33] <nitay> }
{noformat}
This is an infinite loop which causes the messages at the end of the RS Log Amandeep posted.
The flow of logic that leads to this is the following:
# RS session with ZooKeeper expires.
# Master gets znode expiration, starts cleanup/shutdown of RS.
# RS gets its session expired, begins restart() logic, setting stopRequested.
# Meanwhile, RS run() thread is still talking to master.
# Master gets a message from RS, but doesn't know it because it's been removed. This is the "received server report from unknown server..." stuff. Tells the RS to reinitialize, sending MSG_CALL_SERVER_STARTUP.
# RS on getting MSG_CALL_SERVER_STARTUP calls reportForDuty() and is now in a loop. The restart() thread from ZooKeeper is waiting for the RS run() to finish, but it never will.
This simple patch makes reportyForDuty() fail fast when stopRequested is set.
> HRS unable to contact master
> ----------------------------
>
> Key: HBASE-1629
> URL: https://issues.apache.org/jira/browse/HBASE-1629
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Amandeep Khurana
> Assignee: Nitay Joffe
> Fix For: 0.20.0
>
> Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1629) HRS unable to contact master
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729065#action_12729065 ]
stack commented on HBASE-1629:
------------------------------
+1
Would be great if AK could try it before commit but I'm fine w/ commit if not -- easy enough see if this fixes issue.
> HRS unable to contact master
> ----------------------------
>
> Key: HBASE-1629
> URL: https://issues.apache.org/jira/browse/HBASE-1629
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Amandeep Khurana
> Assignee: Nitay Joffe
> Fix For: 0.20.0
>
> Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-1629) HRS unable to contact master
Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nitay Joffe resolved HBASE-1629.
--------------------------------
Resolution: Fixed
Stack said to go ahead and commit this. Amandeep, if you see this again, feel free to reopen the issue.
> HRS unable to contact master
> ----------------------------
>
> Key: HBASE-1629
> URL: https://issues.apache.org/jira/browse/HBASE-1629
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: Amandeep Khurana
> Assignee: Nitay Joffe
> Fix For: 0.20.0
>
> Attachments: hbase-1629.patch, Master_log, RS_Log
>
>
> HRS unable to contact master for initialization after expiration from ZK. Master thinks HRS is still up whereas HRS went down and now cannot restart. The RS logs have a flurry of the following warning messages:
> 2009-07-08 12:53:19,547 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to get master for initialization
> More logs from the RS and the Master attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.