You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/07/19 03:05:39 UTC

[jira] [Resolved] (HBASE-3442) Master failing when node disconnects or dies

     [ https://issues.apache.org/jira/browse/HBASE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell resolved HBASE-3442.
-----------------------------------

    Resolution: Invalid

Issue wasn't actionable

> Master failing when node disconnects or dies
> --------------------------------------------
>
>                 Key: HBASE-3442
>                 URL: https://issues.apache.org/jira/browse/HBASE-3442
>             Project: HBase
>          Issue Type: Bug
>          Components: master, regionserver
>    Affects Versions: 0.90.0
>         Environment: CentOS 5, Hbase .90 RC3, Amazon EC2
>            Reporter: Justin
>            Priority: Minor
>
> We've got our servers running on Amazon EC2 and nodes will go through some shutdown scripts if/when we want to take them out of the mix.  Ended up shutting down one of the nodes, in this case Node98, which cased the immediate crash of the master server.  Upon restarting the master, it would attempt to contact the missing node, and then stop it's startup process.  I believe the node removed itself from the DNS server first, then ran a stop on the datanode, and regionserver.  The missing node was also removed from any slave/regionserver list on the master server.  I finally put in a bogus entry in the /etc/hosts file for the missing node, pointing it back to 127.0.0.1, and the master server finally marked it as a dead node, ignored it, and finished the startup process.
> Going to try and replicate it again and save some more logs, the following log is the only thing I saved from the first occurrence;  It's the master failing to start up while checking for the missing node:  http://pastebin.com/ZyQMQm91



--
This message was sent by Atlassian JIRA
(v6.2#6252)