You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Jinal Shah <ji...@gmail.com> on 2014/07/13 23:02:03 UTC

HBase Failover

Hi everyone,

I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
solution for HBase failover situation. So here is the whole picture of what
is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
region servers. When hbase failovers to from 1 master to another we have
recycle our service in order to get our services to hit hbase otherwise we
get ConnectionRefused exception. I'm not sure what we are doing wrong or if
we are missing any configuration or something. the same thing happens when
we use the hbase shell and if there is a master failover happens then it
starts throwing the same error. Can anyone please help me in knowing why
this is happening? FYI We are using hbase 0.94.2

Thanks
Jinal

Re: HBase Failover

Posted by Esteban Gutierrez <es...@cloudera.com>.
Hi Jinal,

I see that the exception occurred while the client was attempting to fetch
the table descriptor via HTable.getTableDescriptor(), operations that
interact with the HBase Master cannot be retried in the version of HBase
that you are using and you need to catch the IOE and retry the call once
the hbase.rpc.timeout has expired. Since HBase 0.95.2 those operations can
be retried, see https://issues.apache.org/jira/browse/HBASE-8764

cheers,
esteban.




--
Cloudera, Inc.



On Mon, Jul 14, 2014 at 10:59 AM, Jinal Shah <ji...@gmail.com>
wrote:

> Hi esteban,
>
> I don't have access to HBase master logs but I'll try to get it if I can.
> When the failover occurs only the hbase service goes down. We see the
> standby Master being active.
>
> The clients run on different nodes and have the zookeeper configured
> correctly. Here is the post I have on stackoverflow to give more
> information about the error and the hbase-site.xml configuration.
> http://stackoverflow.com/questions/24726994/hbase-failover-situation
>
> cheers,
> Jinal
>
>
> On Mon, Jul 14, 2014 at 12:10 AM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > -dev (bcc) +user
> >
> > Hello Jinal,
> >
> > Can you pastebin the logs from both HBase masters? When this failover
> > occurs, was the HBase master process killed or all services in that node
> > killed? When the HBase master dies it takes about 1 min (default RPC
> > timeout)  for the standby HBase master to transition to active and it is
> > expected that clients that use the HBase master can get a connection
> > refused exception until the standby master becomes an active master.
> >
> > However if your run other services in the same node like ZooKeeper and
> you
> > also run clients on the same node make sure that hbase.zookeeper.quorum
> is
> > configured correctly and has the 3 ZooKeeper nodes, otherwise clients
> > running on this node will get a connection refused from localhost.
> >
> > cheers,
> > esteban.
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> >
> > On Sun, Jul 13, 2014 at 2:02 PM, Jinal Shah <ji...@gmail.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
> > > solution for HBase failover situation. So here is the whole picture of
> > what
> > > is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
> > > region servers. When hbase failovers to from 1 master to another we
> have
> > > recycle our service in order to get our services to hit hbase otherwise
> > we
> > > get ConnectionRefused exception. I'm not sure what we are doing wrong
> or
> > if
> > > we are missing any configuration or something. the same thing happens
> > when
> > > we use the hbase shell and if there is a master failover happens then
> it
> > > starts throwing the same error. Can anyone please help me in knowing
> why
> > > this is happening? FYI We are using hbase 0.94.2
> > >
> > > Thanks
> > > Jinal
> > >
> >
>

Re: HBase Failover

Posted by Jinal Shah <ji...@gmail.com>.
Hi esteban,

I don't have access to HBase master logs but I'll try to get it if I can.
When the failover occurs only the hbase service goes down. We see the
standby Master being active.

The clients run on different nodes and have the zookeeper configured
correctly. Here is the post I have on stackoverflow to give more
information about the error and the hbase-site.xml configuration.
http://stackoverflow.com/questions/24726994/hbase-failover-situation

cheers,
Jinal


On Mon, Jul 14, 2014 at 12:10 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> -dev (bcc) +user
>
> Hello Jinal,
>
> Can you pastebin the logs from both HBase masters? When this failover
> occurs, was the HBase master process killed or all services in that node
> killed? When the HBase master dies it takes about 1 min (default RPC
> timeout)  for the standby HBase master to transition to active and it is
> expected that clients that use the HBase master can get a connection
> refused exception until the standby master becomes an active master.
>
> However if your run other services in the same node like ZooKeeper and you
> also run clients on the same node make sure that hbase.zookeeper.quorum is
> configured correctly and has the 3 ZooKeeper nodes, otherwise clients
> running on this node will get a connection refused from localhost.
>
> cheers,
> esteban.
>
>
>
>
>
>
>
>
> --
> Cloudera, Inc.
>
>
>
> On Sun, Jul 13, 2014 at 2:02 PM, Jinal Shah <ji...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
> > solution for HBase failover situation. So here is the whole picture of
> what
> > is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
> > region servers. When hbase failovers to from 1 master to another we have
> > recycle our service in order to get our services to hit hbase otherwise
> we
> > get ConnectionRefused exception. I'm not sure what we are doing wrong or
> if
> > we are missing any configuration or something. the same thing happens
> when
> > we use the hbase shell and if there is a master failover happens then it
> > starts throwing the same error. Can anyone please help me in knowing why
> > this is happening? FYI We are using hbase 0.94.2
> >
> > Thanks
> > Jinal
> >
>

Re: HBase Failover

Posted by Jinal Shah <ji...@gmail.com>.
Hi esteban,

I don't have access to HBase master logs but I'll try to get it if I can.
When the failover occurs only the hbase service goes down. We see the
standby Master being active.

The clients run on different nodes and have the zookeeper configured
correctly. Here is the post I have on stackoverflow to give more
information about the error and the hbase-site.xml configuration.
http://stackoverflow.com/questions/24726994/hbase-failover-situation

cheers,
Jinal


On Mon, Jul 14, 2014 at 12:10 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> -dev (bcc) +user
>
> Hello Jinal,
>
> Can you pastebin the logs from both HBase masters? When this failover
> occurs, was the HBase master process killed or all services in that node
> killed? When the HBase master dies it takes about 1 min (default RPC
> timeout)  for the standby HBase master to transition to active and it is
> expected that clients that use the HBase master can get a connection
> refused exception until the standby master becomes an active master.
>
> However if your run other services in the same node like ZooKeeper and you
> also run clients on the same node make sure that hbase.zookeeper.quorum is
> configured correctly and has the 3 ZooKeeper nodes, otherwise clients
> running on this node will get a connection refused from localhost.
>
> cheers,
> esteban.
>
>
>
>
>
>
>
>
> --
> Cloudera, Inc.
>
>
>
> On Sun, Jul 13, 2014 at 2:02 PM, Jinal Shah <ji...@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
> > solution for HBase failover situation. So here is the whole picture of
> what
> > is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
> > region servers. When hbase failovers to from 1 master to another we have
> > recycle our service in order to get our services to hit hbase otherwise
> we
> > get ConnectionRefused exception. I'm not sure what we are doing wrong or
> if
> > we are missing any configuration or something. the same thing happens
> when
> > we use the hbase shell and if there is a master failover happens then it
> > starts throwing the same error. Can anyone please help me in knowing why
> > this is happening? FYI We are using hbase 0.94.2
> >
> > Thanks
> > Jinal
> >
>

Re: HBase Failover

Posted by Esteban Gutierrez <es...@cloudera.com>.
-dev (bcc) +user

Hello Jinal,

Can you pastebin the logs from both HBase masters? When this failover
occurs, was the HBase master process killed or all services in that node
killed? When the HBase master dies it takes about 1 min (default RPC
timeout)  for the standby HBase master to transition to active and it is
expected that clients that use the HBase master can get a connection
refused exception until the standby master becomes an active master.

However if your run other services in the same node like ZooKeeper and you
also run clients on the same node make sure that hbase.zookeeper.quorum is
configured correctly and has the 3 ZooKeeper nodes, otherwise clients
running on this node will get a connection refused from localhost.

cheers,
esteban.








--
Cloudera, Inc.



On Sun, Jul 13, 2014 at 2:02 PM, Jinal Shah <ji...@gmail.com> wrote:

> Hi everyone,
>
> I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
> solution for HBase failover situation. So here is the whole picture of what
> is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
> region servers. When hbase failovers to from 1 master to another we have
> recycle our service in order to get our services to hit hbase otherwise we
> get ConnectionRefused exception. I'm not sure what we are doing wrong or if
> we are missing any configuration or something. the same thing happens when
> we use the hbase shell and if there is a master failover happens then it
> starts throwing the same error. Can anyone please help me in knowing why
> this is happening? FYI We are using hbase 0.94.2
>
> Thanks
> Jinal
>

Re: HBase Failover

Posted by Esteban Gutierrez <es...@cloudera.com>.
-dev (bcc) +user

Hello Jinal,

Can you pastebin the logs from both HBase masters? When this failover
occurs, was the HBase master process killed or all services in that node
killed? When the HBase master dies it takes about 1 min (default RPC
timeout)  for the standby HBase master to transition to active and it is
expected that clients that use the HBase master can get a connection
refused exception until the standby master becomes an active master.

However if your run other services in the same node like ZooKeeper and you
also run clients on the same node make sure that hbase.zookeeper.quorum is
configured correctly and has the 3 ZooKeeper nodes, otherwise clients
running on this node will get a connection refused from localhost.

cheers,
esteban.








--
Cloudera, Inc.



On Sun, Jul 13, 2014 at 2:02 PM, Jinal Shah <ji...@gmail.com> wrote:

> Hi everyone,
>
> I'm Jinal Shah. I'm kind of new to HBase and I'm trying to find the
> solution for HBase failover situation. So here is the whole picture of what
> is happening. We have 3 zookeeper nodes, 2 Hbase master nodes and some
> region servers. When hbase failovers to from 1 master to another we have
> recycle our service in order to get our services to hit hbase otherwise we
> get ConnectionRefused exception. I'm not sure what we are doing wrong or if
> we are missing any configuration or something. the same thing happens when
> we use the hbase shell and if there is a master failover happens then it
> starts throwing the same error. Can anyone please help me in knowing why
> this is happening? FYI We are using hbase 0.94.2
>
> Thanks
> Jinal
>