You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by S Naik <sn...@attributor.com> on 2013/01/24 02:35:43 UTC

cdh4 HA fencing fails when the other node is down

Hi,

I am trying to setup HA Namenode using cdh4, zkfc.
It works great when I kill -9 the active namenode.
But if I reboot/shutdown the host with active namenode. Failover fails.

The ZKFC complains fencing not succesful.
It has no route to host exception.

Is this expected ?

I looked into mailing list.
It seems that the fix is to move away from zkfc and use quorum based
auto failover.

But, this should be a pretty common requirement and I would think
there will be a solution for this scenario (With zkfc).

Please guide me/point me to solution.


-Sagar

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J

Re: cdh4 HA fencing fails when the other node is down

Posted by Harsh J <ha...@cloudera.com>.
Hi Sagar,

Moving the discussion to cdh-user@cloudera.org as you're asking a CDH
specific question here. List can be subscribed at
https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user.

Please take a look at the discussion at
https://groups.google.com/a/cloudera.org/d/topic/cdh-user/l21m7gzQYb0/discussion
and Todd's specific reply there on why a simple fencer won't work for
all scenarios. Using QJM only complements use of ZKFC for Automatic
Failover, does not replace it. QJM is a storage method, while ZKFC
continues to be used for the actual failover-control.

On Thu, Jan 24, 2013 at 7:05 AM, S Naik <sn...@attributor.com> wrote:
> Hi,
>
> I am trying to setup HA Namenode using cdh4, zkfc.
> It works great when I kill -9 the active namenode.
> But if I reboot/shutdown the host with active namenode. Failover fails.
>
> The ZKFC complains fencing not succesful.
> It has no route to host exception.
>
> Is this expected ?
>
> I looked into mailing list.
> It seems that the fix is to move away from zkfc and use quorum based
> auto failover.
>
> But, this should be a pretty common requirement and I would think
> there will be a solution for this scenario (With zkfc).
>
> Please guide me/point me to solution.
>
>
> -Sagar



-- 
Harsh J