You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sandeep L <sa...@outlook.com> on 2015/03/03 12:08:28 UTC

Where is HBase failed servers list stored

Hi,
While trying to run hbase balancer I am getting error message as "This server is in the failed servers list".Due to this cluster is not getting balanced.
Even though regionserver is up and running hmaster is unable to connect to it.
The odd thing here is hmaster is able to start regionserver and it is detected as up and running but unable to assign regions.
Can some one suggest any solution for this.
Following is full stack trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This server is in the failed servers list: host1/192.168.2.20:60020	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)	at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)	at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)	at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)	at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)	at org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)	at org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)	at org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)	at java.util.concurrent.FutureTask.run(FutureTask.java:262)	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)	at java.lang.Thread.run(Thread.java:745)
Thanks,Sandeep. 		 	   		  

Re: Where is HBase failed servers list stored

Posted by Nicolas Liochon <nk...@gmail.com>.
As Bryan.
Le 5 mars 2015 17:55, "Bryan Beaudreault" <bb...@hubspot.com> a
écrit :

> You should run with a backup master in a production cluster.  The failover
> process works very well and will cause no downtime.  I've done it literally
> hundreds of times across our multiple production hbase clusters.
>
> Even if you don't have a backup master, you should still be fine with
> restarting the master.  It can handle a brief blip without any problems,
> from what I've seen.  The master is really only used for coordination such
> as region moves, RS failovers, etc.  Your clients can still retrieve data
> from your regionservers, as long as no servers die in the brief moment you
> are masterless.
>
> On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy <sa...@outlook.com>
> wrote:
>
> > Since ours is production cluster we cant restart master.
> > In our test cluster I tested this scenario, and it got resolved after
> > restarting master.
> > Other than restarting master I couldn't find any solution.
> > Thanks,Sandeep.
> >
> > > From: nkeywal@gmail.com
> > > Date: Wed, 4 Mar 2015 14:55:03 +0100
> > > Subject: Re: Where is HBase failed servers list stored
> > > To: user@hbase.apache.org
> > >
> > > If I understand the issue correctly, restarting the master should solve
> > the
> > > problem.
> > >
> > > On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Please see HBASE-13067 Fix caching of stubs to allow IP address
> > changes of
> > > > restarted remote servers
> > > >
> > > > Cheers
> > > >
> > > > On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sandeepvreddy@outlook.com
> >
> > > > wrote:
> > > >
> > > > > Hi nkeywal,
> > > > > While trying to get more details about this issue I got to know
> that
> > > > > HMaster is trying to connect to wrong IP Address.
> > > > > Here is exact issue:
> > > > > Due to some unavoidable reason we are forced to change IP Address
> of
> > > > > regionsserver & then updated new IP Address in /etc/hosts file
> > across all
> > > > > HBase servers. I started RegionServer from master with
> start-hbase.sh
> > > > > scripts & jps output in regionserver shows it's(regionserver
> > process) up
> > > > > and running.
> > > > > But when running hbase balancer HMaster is trying to connect to old
> > IP
> > > > > Address instead of new IP Address.
> > > > > One more thing here is when I checked regionserver status on 60010
> > port
> > > > > its showing as up and running.
> > > > > Thanks,Sandeep.
> > > > >
> > > > > > From: nkeywal@gmail.com
> > > > > > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > > > > > Subject: Re: Where is HBase failed servers list stored
> > > > > > To: user@hbase.apache.org
> > > > > >
> > > > > > It's in local memory. When HBase cannot connect to a server, it
> > puts it
> > > > > > into the "failedServerList" for 2 seconds. This is to avoid
> having
> > all
> > > > > the
> > > > > > threads going into a potentially long socket timeout. Are you
> sure
> > that
> > > > > you
> > > > > > can connect from the master to this machine/port?
> > > > > >
> > > > > > You can change the time it stays in the list with
> > > > > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
> > should
> > > > > not
> > > > > > help.
> > > > > >
> > > > > > You should have another exception before this one in the logs
> (the
> > one
> > > > > that
> > > > > > initially put this region server in this failedServerList).
> > > > > >
> > > > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <
> > sandeepvreddy@outlook.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > While trying to run hbase balancer I am getting error message
> as
> > > > "This
> > > > > > > server is in the failed servers list".Due to this cluster is
> not
> > > > > getting
> > > > > > > balanced.
> > > > > > > Even though regionserver is up and running hmaster is unable to
> > > > > connect to
> > > > > > > it.
> > > > > > > The odd thing here is hmaster is able to start regionserver and
> > it is
> > > > > > > detected as up and running but unable to assign regions.
> > > > > > > Can some one suggest any solution for this.
> > > > > > > Following is full stack
> > > > > > >
> > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> > > > This
> > > > > > > server is in the failed servers list: host1/192.168.2.20:60020
> > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > > > > > at
> > > > > > >
> > > > >
> > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > > > > > >  at
> > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> > > > >   at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > > > > > >       at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > > > > > >      at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > > > > > >    at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > > > > > >   at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > > > > > at
> > > > > > >
> > > > >
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > > > > > at
> > > > >
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > > >  at
> > > > > > >
> > > > >
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > > >     at
> > > > > > >
> > > > >
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > > >     at java.lang.Thread.run(Thread.java:745)
> > > > > > > Thanks,Sandeep.
> > > > >
> > > > >
> > > >
> >
> >
>

Re: Where is HBase failed servers list stored

Posted by Bryan Beaudreault <bb...@hubspot.com>.
You should run with a backup master in a production cluster.  The failover
process works very well and will cause no downtime.  I've done it literally
hundreds of times across our multiple production hbase clusters.

Even if you don't have a backup master, you should still be fine with
restarting the master.  It can handle a brief blip without any problems,
from what I've seen.  The master is really only used for coordination such
as region moves, RS failovers, etc.  Your clients can still retrieve data
from your regionservers, as long as no servers die in the brief moment you
are masterless.

On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy <sa...@outlook.com>
wrote:

> Since ours is production cluster we cant restart master.
> In our test cluster I tested this scenario, and it got resolved after
> restarting master.
> Other than restarting master I couldn't find any solution.
> Thanks,Sandeep.
>
> > From: nkeywal@gmail.com
> > Date: Wed, 4 Mar 2015 14:55:03 +0100
> > Subject: Re: Where is HBase failed servers list stored
> > To: user@hbase.apache.org
> >
> > If I understand the issue correctly, restarting the master should solve
> the
> > problem.
> >
> > On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Please see HBASE-13067 Fix caching of stubs to allow IP address
> changes of
> > > restarted remote servers
> > >
> > > Cheers
> > >
> > > On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sa...@outlook.com>
> > > wrote:
> > >
> > > > Hi nkeywal,
> > > > While trying to get more details about this issue I got to know that
> > > > HMaster is trying to connect to wrong IP Address.
> > > > Here is exact issue:
> > > > Due to some unavoidable reason we are forced to change IP Address of
> > > > regionsserver & then updated new IP Address in /etc/hosts file
> across all
> > > > HBase servers. I started RegionServer from master with start-hbase.sh
> > > > scripts & jps output in regionserver shows it's(regionserver
> process) up
> > > > and running.
> > > > But when running hbase balancer HMaster is trying to connect to old
> IP
> > > > Address instead of new IP Address.
> > > > One more thing here is when I checked regionserver status on 60010
> port
> > > > its showing as up and running.
> > > > Thanks,Sandeep.
> > > >
> > > > > From: nkeywal@gmail.com
> > > > > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > > > > Subject: Re: Where is HBase failed servers list stored
> > > > > To: user@hbase.apache.org
> > > > >
> > > > > It's in local memory. When HBase cannot connect to a server, it
> puts it
> > > > > into the "failedServerList" for 2 seconds. This is to avoid having
> all
> > > > the
> > > > > threads going into a potentially long socket timeout. Are you sure
> that
> > > > you
> > > > > can connect from the master to this machine/port?
> > > > >
> > > > > You can change the time it stays in the list with
> > > > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it
> should
> > > > not
> > > > > help.
> > > > >
> > > > > You should have another exception before this one in the logs (the
> one
> > > > that
> > > > > initially put this region server in this failedServerList).
> > > > >
> > > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <
> sandeepvreddy@outlook.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > While trying to run hbase balancer I am getting error message as
> > > "This
> > > > > > server is in the failed servers list".Due to this cluster is not
> > > > getting
> > > > > > balanced.
> > > > > > Even though regionserver is up and running hmaster is unable to
> > > > connect to
> > > > > > it.
> > > > > > The odd thing here is hmaster is able to start regionserver and
> it is
> > > > > > detected as up and running but unable to assign regions.
> > > > > > Can some one suggest any solution for this.
> > > > > > Following is full stack
> > > > > >
> trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> > > This
> > > > > > server is in the failed servers list: host1/192.168.2.20:60020
> at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > > > > at
> > > > > >
> > > >
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > > > > >  at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> > > >   at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > > > > >       at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > > > > >      at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > > > > >    at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > > > > >   at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > > > > at
> > > > > >
> > > >
> > >
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > > > > at
> > > >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > >  at
> > > > > >
> > > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > > >     at
> > > > > >
> > > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > > >     at java.lang.Thread.run(Thread.java:745)
> > > > > > Thanks,Sandeep.
> > > >
> > > >
> > >
>
>

RE: Where is HBase failed servers list stored

Posted by Sandeep Reddy <sa...@outlook.com>.
Since ours is production cluster we cant restart master.
In our test cluster I tested this scenario, and it got resolved after restarting master.
Other than restarting master I couldn't find any solution.
Thanks,Sandeep.

> From: nkeywal@gmail.com
> Date: Wed, 4 Mar 2015 14:55:03 +0100
> Subject: Re: Where is HBase failed servers list stored
> To: user@hbase.apache.org
> 
> If I understand the issue correctly, restarting the master should solve the
> problem.
> 
> On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> > Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
> > restarted remote servers
> >
> > Cheers
> >
> > On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sa...@outlook.com>
> > wrote:
> >
> > > Hi nkeywal,
> > > While trying to get more details about this issue I got to know that
> > > HMaster is trying to connect to wrong IP Address.
> > > Here is exact issue:
> > > Due to some unavoidable reason we are forced to change IP Address of
> > > regionsserver & then updated new IP Address in /etc/hosts file across all
> > > HBase servers. I started RegionServer from master with start-hbase.sh
> > > scripts & jps output in regionserver shows it's(regionserver process) up
> > > and running.
> > > But when running hbase balancer HMaster is trying to connect to old IP
> > > Address instead of new IP Address.
> > > One more thing here is when I checked regionserver status on 60010 port
> > > its showing as up and running.
> > > Thanks,Sandeep.
> > >
> > > > From: nkeywal@gmail.com
> > > > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > > > Subject: Re: Where is HBase failed servers list stored
> > > > To: user@hbase.apache.org
> > > >
> > > > It's in local memory. When HBase cannot connect to a server, it puts it
> > > > into the "failedServerList" for 2 seconds. This is to avoid having all
> > > the
> > > > threads going into a potentially long socket timeout. Are you sure that
> > > you
> > > > can connect from the master to this machine/port?
> > > >
> > > > You can change the time it stays in the list with
> > > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
> > > not
> > > > help.
> > > >
> > > > You should have another exception before this one in the logs (the one
> > > that
> > > > initially put this region server in this failedServerList).
> > > >
> > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <sa...@outlook.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > While trying to run hbase balancer I am getting error message as
> > "This
> > > > > server is in the failed servers list".Due to this cluster is not
> > > getting
> > > > > balanced.
> > > > > Even though regionserver is up and running hmaster is unable to
> > > connect to
> > > > > it.
> > > > > The odd thing here is hmaster is able to start regionserver and it is
> > > > > detected as up and running but unable to assign regions.
> > > > > Can some one suggest any solution for this.
> > > > > Following is full stack
> > > > > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> > This
> > > > > server is in the failed servers list: host1/192.168.2.20:60020  at
> > > > >
> > >
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > > > at
> > > > >
> > > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > > > >  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> > >   at
> > > > >
> > >
> > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > > > >       at
> > > > >
> > >
> > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > > > >      at
> > > > >
> > >
> > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > > > >    at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > > > >   at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > > > at
> > > > >
> > >
> > org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > > > at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >  at
> > > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > > >     at
> > > > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > > >     at java.lang.Thread.run(Thread.java:745)
> > > > > Thanks,Sandeep.
> > >
> > >
> >
 		 	   		  

Re: Where is HBase failed servers list stored

Posted by Nicolas Liochon <nk...@gmail.com>.
If I understand the issue correctly, restarting the master should solve the
problem.

On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yu...@gmail.com> wrote:

> Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
> restarted remote servers
>
> Cheers
>
> On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sa...@outlook.com>
> wrote:
>
> > Hi nkeywal,
> > While trying to get more details about this issue I got to know that
> > HMaster is trying to connect to wrong IP Address.
> > Here is exact issue:
> > Due to some unavoidable reason we are forced to change IP Address of
> > regionsserver & then updated new IP Address in /etc/hosts file across all
> > HBase servers. I started RegionServer from master with start-hbase.sh
> > scripts & jps output in regionserver shows it's(regionserver process) up
> > and running.
> > But when running hbase balancer HMaster is trying to connect to old IP
> > Address instead of new IP Address.
> > One more thing here is when I checked regionserver status on 60010 port
> > its showing as up and running.
> > Thanks,Sandeep.
> >
> > > From: nkeywal@gmail.com
> > > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > > Subject: Re: Where is HBase failed servers list stored
> > > To: user@hbase.apache.org
> > >
> > > It's in local memory. When HBase cannot connect to a server, it puts it
> > > into the "failedServerList" for 2 seconds. This is to avoid having all
> > the
> > > threads going into a potentially long socket timeout. Are you sure that
> > you
> > > can connect from the master to this machine/port?
> > >
> > > You can change the time it stays in the list with
> > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
> > not
> > > help.
> > >
> > > You should have another exception before this one in the logs (the one
> > that
> > > initially put this region server in this failedServerList).
> > >
> > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <sa...@outlook.com>
> > > wrote:
> > >
> > > > Hi,
> > > > While trying to run hbase balancer I am getting error message as
> "This
> > > > server is in the failed servers list".Due to this cluster is not
> > getting
> > > > balanced.
> > > > Even though regionserver is up and running hmaster is unable to
> > connect to
> > > > it.
> > > > The odd thing here is hmaster is able to start regionserver and it is
> > > > detected as up and running but unable to assign regions.
> > > > Can some one suggest any solution for this.
> > > > Following is full stack
> > > > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException:
> This
> > > > server is in the failed servers list: host1/192.168.2.20:60020  at
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > > at
> > > >
> > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > > >  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
> >   at
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > > >       at
> > > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > > >      at
> > > >
> >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > > >    at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > > >   at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > > at
> > > >
> >
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > > at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>  at
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > > >     at
> > > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > >     at java.lang.Thread.run(Thread.java:745)
> > > > Thanks,Sandeep.
> >
> >
>

Re: Where is HBase failed servers list stored

Posted by Ted Yu <yu...@gmail.com>.
Please see HBASE-13067 Fix caching of stubs to allow IP address changes of
restarted remote servers

Cheers

On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sa...@outlook.com> wrote:

> Hi nkeywal,
> While trying to get more details about this issue I got to know that
> HMaster is trying to connect to wrong IP Address.
> Here is exact issue:
> Due to some unavoidable reason we are forced to change IP Address of
> regionsserver & then updated new IP Address in /etc/hosts file across all
> HBase servers. I started RegionServer from master with start-hbase.sh
> scripts & jps output in regionserver shows it's(regionserver process) up
> and running.
> But when running hbase balancer HMaster is trying to connect to old IP
> Address instead of new IP Address.
> One more thing here is when I checked regionserver status on 60010 port
> its showing as up and running.
> Thanks,Sandeep.
>
> > From: nkeywal@gmail.com
> > Date: Tue, 3 Mar 2015 19:01:01 +0100
> > Subject: Re: Where is HBase failed servers list stored
> > To: user@hbase.apache.org
> >
> > It's in local memory. When HBase cannot connect to a server, it puts it
> > into the "failedServerList" for 2 seconds. This is to avoid having all
> the
> > threads going into a potentially long socket timeout. Are you sure that
> you
> > can connect from the master to this machine/port?
> >
> > You can change the time it stays in the list with
> > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should
> not
> > help.
> >
> > You should have another exception before this one in the logs (the one
> that
> > initially put this region server in this failedServerList).
> >
> > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <sa...@outlook.com>
> > wrote:
> >
> > > Hi,
> > > While trying to run hbase balancer I am getting error message as "This
> > > server is in the failed servers list".Due to this cluster is not
> getting
> > > balanced.
> > > Even though regionserver is up and running hmaster is unable to
> connect to
> > > it.
> > > The odd thing here is hmaster is able to start regionserver and it is
> > > detected as up and running but unable to assign regions.
> > > Can some one suggest any solution for this.
> > > Following is full stack
> > > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
> > > server is in the failed servers list: host1/192.168.2.20:60020  at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > > at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> > >  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
>   at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> > >       at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> > >      at
> > >
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > > at
> > >
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > > at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > > at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > > at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > > at
> > >
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> > >    at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> > >   at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > > at
> > >
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)     at
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >     at
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >     at java.lang.Thread.run(Thread.java:745)
> > > Thanks,Sandeep.
>
>

RE: Where is HBase failed servers list stored

Posted by Sandeep L <sa...@outlook.com>.
Hi nkeywal,
While trying to get more details about this issue I got to know that HMaster is trying to connect to wrong IP Address.
Here is exact issue:
Due to some unavoidable reason we are forced to change IP Address of regionsserver & then updated new IP Address in /etc/hosts file across all HBase servers. I started RegionServer from master with start-hbase.sh scripts & jps output in regionserver shows it's(regionserver process) up and running.
But when running hbase balancer HMaster is trying to connect to old IP Address instead of new IP Address.
One more thing here is when I checked regionserver status on 60010 port its showing as up and running. 
Thanks,Sandeep.

> From: nkeywal@gmail.com
> Date: Tue, 3 Mar 2015 19:01:01 +0100
> Subject: Re: Where is HBase failed servers list stored
> To: user@hbase.apache.org
> 
> It's in local memory. When HBase cannot connect to a server, it puts it
> into the "failedServerList" for 2 seconds. This is to avoid having all the
> threads going into a potentially long socket timeout. Are you sure that you
> can connect from the master to this machine/port?
> 
> You can change the time it stays in the list with
> hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not
> help.
> 
> You should have another exception before this one in the logs (the one that
> initially put this region server in this failedServerList).
> 
> On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <sa...@outlook.com>
> wrote:
> 
> > Hi,
> > While trying to run hbase balancer I am getting error message as "This
> > server is in the failed servers list".Due to this cluster is not getting
> > balanced.
> > Even though regionserver is up and running hmaster is unable to connect to
> > it.
> > The odd thing here is hmaster is able to start regionserver and it is
> > detected as up and running but unable to assign regions.
> > Can some one suggest any solution for this.
> > Following is full stack
> > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
> > server is in the failed servers list: host1/192.168.2.20:60020  at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> > at
> > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
> >  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)      at
> > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
> >       at
> > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
> >      at
> > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> > at
> > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> > at
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> > at
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> > at
> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> > at
> > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
> >    at
> > org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
> >   at
> > org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> > at
> > org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)     at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:745)
> > Thanks,Sandeep.
 		 	   		  

Re: Where is HBase failed servers list stored

Posted by Nicolas Liochon <nk...@gmail.com>.
It's in local memory. When HBase cannot connect to a server, it puts it
into the "failedServerList" for 2 seconds. This is to avoid having all the
threads going into a potentially long socket timeout. Are you sure that you
can connect from the master to this machine/port?

You can change the time it stays in the list with
hbase.ipc.client.failed.servers.expiry (in milliseconds), but it should not
help.

You should have another exception before this one in the logs (the one that
initially put this region server in this failedServerList).

On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L <sa...@outlook.com>
wrote:

> Hi,
> While trying to run hbase balancer I am getting error message as "This
> server is in the failed servers list".Due to this cluster is not getting
> balanced.
> Even though regionserver is up and running hmaster is unable to connect to
> it.
> The odd thing here is hmaster is able to start regionserver and it is
> detected as up and running but unable to assign regions.
> Can some one suggest any solution for this.
> Following is full stack
> trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: This
> server is in the failed servers list: host1/192.168.2.20:60020  at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
>  at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)      at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
>       at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
>      at
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964)
> at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577)
> at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550)
> at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104)
>    at
> org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999)
>   at
> org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447)
> at
> org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> Thanks,Sandeep.