You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sandy Pratt <pr...@adobe.com> on 2011/04/11 20:03:43 UTC

Catching ZK ConnectionLoss with HTable

Hi all,

I had an issue recently where a scan job I frequently run caught ConnectionLoss and subsequently failed to recover.

The stack trace looks like this:

11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d8 closed
11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No longer connected to ZooKeeper, current state: Disconnected
11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d9 closed
11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper
11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:21811 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.z
ookeeper.ZooKeeperWrapper@51127a
11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting stats for /hbase/rs
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
        at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:102)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:732)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:677)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:650)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:470)
        at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1145)
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuilder.java:215)
        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:310)
        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
        at BuildAfs.main(BuildAfs.java:43)
11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

It then goes on to retry endlessly.  Killing the spinning job and running it again worked fine, so crashing would be preferable to me over retrying endlessly.

I'm not especially concerned about what went wrong to cause ConnectionLoss in the first place, but I am interested in being able to set some behavior for handling the ZK exceptions elegantly.  For example, the call site in my code leading to the exception is this:

Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
Result result = timestampsTable.get(get);

I suppose this means that if I want to catch ConnectionLoss in my code, I have to wrap all my gets and puts with that catch block.  Or maybe just the first one?  It seems like HTable and friends might be able to catch this exception in a more central location, maybe somewhere in here:

at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)

I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to a newer version?

Thanks,
Sandy

Re: Catching ZK ConnectionLoss with HTable

Posted by Jean-Daniel Cryans <jd...@apache.org>.

No worries.

Regarding stopping the ZK client from trying to connect, closing the
connections using HConnectionManager will stop the client.

J-D

On Thu, Apr 14, 2011 at 12:23 PM, Sandy Pratt <pr...@adobe.com> wrote:
> Actually, upon looking at it further, I think this one has more to do with SSH tunnels than with ZK per se.  Let me explain.
>
> This tool runs as a cron job.  It locks locally to prevent overruns.  It establishes an SSH dynamic proxy (-D portnum) for Hadoop and HBase clients to use as well as a direct tunnel to ZK on 21811.  Here' s the sequence of events:
>
> 1) Last run is finishing up
> 2) New run starts, initializes clients before looking for the lock that last run holds
> 3) New clients find all the network access they need using old run's SSH process
> 4) Last run finishes, closing SSH client and releasing lock
> 5) New run checks lock, acquires, proceeds
> 6) HBase client fails a get as it can't find a region server (the SSH tunnel it found during init is gone, and the new one couldn't be established because ports were in use)
> 7) The reconnect would likely have succeeded if the SSH tunnel were in place (or just not needed)
>
> To sum up, I'm pretty certain that this is not an HBase or ZK problem, except in as much as I'd like to be able to tell ZK to stop trying at some point and I'm not sure how to do that.  But, I certainly need to change the bounds of my locks, and longer term get off cronjobs and SSH tunnels.  Thanks for taking a look though, J-D, and I'm sorry if you wasted too much time on it.
>
> Sandy
>
>> -----Original Message-----
>> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
>> Daniel Cryans
>> Sent: Monday, April 11, 2011 17:34
>> To: user@hbase.apache.org
>> Subject: Re: Catching ZK ConnectionLoss with HTable
>>
>> I thought a lot more about this issue and it could be a bigger undertaking than
>> I thought, basically any HTable operation can throw ZK-related errors and I
>> think they should be considered as fatal.
>>
>> In the mean time HBase could improve the situation a bit. You say it was
>> spinning, do you know where exactly? Looking at the 0.90 code, if there's a
>> ConnectionLoss it will be eaten by HCM.prefetchRegionCache and then the
>> normal .META. querying will take place so I don't see where it could be
>> spinning.
>>
>> J-D
>>
>> On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <pr...@adobe.com> wrote:
>> > Thanks J-D.  I'll keep an eye on the Jira.
>> >
>> >> -----Original Message-----
>> >> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
>> >> Jean- Daniel Cryans
>> >> Sent: Monday, April 11, 2011 11:52
>> >> To: user@hbase.apache.org
>> >> Subject: Re: Catching ZK ConnectionLoss with HTable
>> >>
>> >> I'm cleaning this up in this jira
>> >> https://issues.apache.org/jira/browse/HBASE-3755
>> >>
>> >> But it's a failure case I haven't seen before, really interesting.
>> >> There's a HTable that's created in the guts if HCM that will throw a
>> >> ZookeeperConnectionException but it will bubble up as an IOE. I'll
>> >> try to address this too in 3755.
>> >>
>> >> J-D
>> >>
>

RE: Catching ZK ConnectionLoss with HTable

Posted by Sandy Pratt <pr...@adobe.com>.

Actually, upon looking at it further, I think this one has more to do with SSH tunnels than with ZK per se.  Let me explain.

This tool runs as a cron job.  It locks locally to prevent overruns.  It establishes an SSH dynamic proxy (-D portnum) for Hadoop and HBase clients to use as well as a direct tunnel to ZK on 21811.  Here' s the sequence of events:

1) Last run is finishing up
2) New run starts, initializes clients before looking for the lock that last run holds
3) New clients find all the network access they need using old run's SSH process
4) Last run finishes, closing SSH client and releasing lock
5) New run checks lock, acquires, proceeds
6) HBase client fails a get as it can't find a region server (the SSH tunnel it found during init is gone, and the new one couldn't be established because ports were in use)
7) The reconnect would likely have succeeded if the SSH tunnel were in place (or just not needed)

To sum up, I'm pretty certain that this is not an HBase or ZK problem, except in as much as I'd like to be able to tell ZK to stop trying at some point and I'm not sure how to do that.  But, I certainly need to change the bounds of my locks, and longer term get off cronjobs and SSH tunnels.  Thanks for taking a look though, J-D, and I'm sorry if you wasted too much time on it.

Sandy

> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
> Daniel Cryans
> Sent: Monday, April 11, 2011 17:34
> To: user@hbase.apache.org
> Subject: Re: Catching ZK ConnectionLoss with HTable
> 
> I thought a lot more about this issue and it could be a bigger undertaking than
> I thought, basically any HTable operation can throw ZK-related errors and I
> think they should be considered as fatal.
> 
> In the mean time HBase could improve the situation a bit. You say it was
> spinning, do you know where exactly? Looking at the 0.90 code, if there's a
> ConnectionLoss it will be eaten by HCM.prefetchRegionCache and then the
> normal .META. querying will take place so I don't see where it could be
> spinning.
> 
> J-D
> 
> On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <pr...@adobe.com> wrote:
> > Thanks J-D.  I'll keep an eye on the Jira.
> >
> >> -----Original Message-----
> >> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of
> >> Jean- Daniel Cryans
> >> Sent: Monday, April 11, 2011 11:52
> >> To: user@hbase.apache.org
> >> Subject: Re: Catching ZK ConnectionLoss with HTable
> >>
> >> I'm cleaning this up in this jira
> >> https://issues.apache.org/jira/browse/HBASE-3755
> >>
> >> But it's a failure case I haven't seen before, really interesting.
> >> There's a HTable that's created in the guts if HCM that will throw a
> >> ZookeeperConnectionException but it will bubble up as an IOE. I'll
> >> try to address this too in 3755.
> >>
> >> J-D
> >>

Re: Catching ZK ConnectionLoss with HTable

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I thought a lot more about this issue and it could be a bigger
undertaking than I thought, basically any HTable operation can throw
ZK-related errors and I think they should be considered as fatal.

In the mean time HBase could improve the situation a bit. You say it
was spinning, do you know where exactly? Looking at the 0.90 code, if
there's a ConnectionLoss it will be eaten by HCM.prefetchRegionCache
and then the normal .META. querying will take place so I don't see
where it could be spinning.

J-D

On Mon, Apr 11, 2011 at 2:13 PM, Sandy Pratt <pr...@adobe.com> wrote:
> Thanks J-D.  I'll keep an eye on the Jira.
>
>> -----Original Message-----
>> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
>> Daniel Cryans
>> Sent: Monday, April 11, 2011 11:52
>> To: user@hbase.apache.org
>> Subject: Re: Catching ZK ConnectionLoss with HTable
>>
>> I'm cleaning this up in this jira
>> https://issues.apache.org/jira/browse/HBASE-3755
>>
>> But it's a failure case I haven't seen before, really interesting.
>> There's a HTable that's created in the guts if HCM that will throw a
>> ZookeeperConnectionException but it will bubble up as an IOE. I'll try to
>> address this too in 3755.
>>
>> J-D
>>

RE: Catching ZK ConnectionLoss with HTable

Posted by Sandy Pratt <pr...@adobe.com>.

Thanks J-D.  I'll keep an eye on the Jira.

> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-
> Daniel Cryans
> Sent: Monday, April 11, 2011 11:52
> To: user@hbase.apache.org
> Subject: Re: Catching ZK ConnectionLoss with HTable
> 
> I'm cleaning this up in this jira
> https://issues.apache.org/jira/browse/HBASE-3755
> 
> But it's a failure case I haven't seen before, really interesting.
> There's a HTable that's created in the guts if HCM that will throw a
> ZookeeperConnectionException but it will bubble up as an IOE. I'll try to
> address this too in 3755.
> 
> J-D
> 
> On Mon, Apr 11, 2011 at 11:03 AM, Sandy Pratt <pr...@adobe.com> wrote:
> > Hi all,
> >
> > I had an issue recently where a scan job I frequently run caught
> ConnectionLoss and subsequently failed to recover.
> >
> > The stack trace looks like this:
> >
> > 11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session:
> 0x12f2497b00d03d8
> > closed
> > 11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No
> > longer connected to ZooKeeper, current state: Disconnected
> > 11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session:
> 0x12f2497b00d03d9
> > closed
> > 11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to
> > zookeeper
> > 11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client
> > connection, connectString=localhost:21811 sessionTimeout=60000
> > watcher=org.apache.hadoop.hbase.z
> ookeeper.ZooKeeperWrapper@51127a
> > 11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> > 11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting
> > stats for /hbase/rs
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /hbase/rs
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> >        at
> > org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
> >        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
> >        at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCo
> unt
> > (ZooKeeperWrapper.java:754)
> >        at
> > org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
> >        at
> > org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
> >        at
> >
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:
> 1
> > 02)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetc
> > hRegionCache(HConnectionManager.java:732)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateR
> > egionInMeta(HConnectionManager.java:783)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateR
> > egion(HConnectionManager.java:677)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocat
> > eRegion(HConnectionManager.java:650)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getReg
> i
> > onLocation(HConnectionManager.java:470)
> >        at
> > org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(Server
> > Callable.java:57)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getReg
> i
> > onServerWithRetries(HConnectionManager.java:1145)
> >        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuild
> > er.java:215)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:3
> > 10)
> >        at
> > com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
> >        at BuildAfs.main(BuildAfs.java:43)
> > 11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> > 11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection
> > to server localhost/127.0.0.1:21811
> > 11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server
> > null, unexpected error, closing socket connection and attempting
> > reconnect
> >
> > It then goes on to retry endlessly.  Killing the spinning job and running it
> again worked fine, so crashing would be preferable to me over retrying
> endlessly.
> >
> > I'm not especially concerned about what went wrong to cause
> ConnectionLoss in the first place, but I am interested in being able to set
> some behavior for handling the ZK exceptions elegantly.  For example, the
> call site in my code leading to the exception is this:
> >
> > Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
> > Result result = timestampsTable.get(get);
> >
> > I suppose this means that if I want to catch ConnectionLoss in my code, I
> have to wrap all my gets and puts with that catch block.  Or maybe just the
> first one?  It seems like HTable and friends might be able to catch this
> exception in a more central location, maybe somewhere in here:
> >
> > at
> >
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCo
> unt
> > (ZooKeeperWrapper.java:754)
> >
> > I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to
> a newer version?
> >
> > Thanks,
> > Sandy
> >

Re: Catching ZK ConnectionLoss with HTable

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I'm cleaning this up in this jira
https://issues.apache.org/jira/browse/HBASE-3755

But it's a failure case I haven't seen before, really interesting.
There's a HTable that's created in the guts if HCM that will throw a
ZookeeperConnectionException but it will bubble up as an IOE. I'll try
to address this too in 3755.

J-D

On Mon, Apr 11, 2011 at 11:03 AM, Sandy Pratt <pr...@adobe.com> wrote:
> Hi all,
>
> I had an issue recently where a scan job I frequently run caught ConnectionLoss and subsequently failed to recover.
>
> The stack trace looks like this:
>
> 11/04/08 12:20:04 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d8 closed
> 11/04/08 12:20:04 WARN client.HConnectionManager$ClientZKWatcher: No longer connected to ZooKeeper, current state: Disconnected
> 11/04/08 12:20:05 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
> 11/04/08 12:20:05 INFO zookeeper.ZooKeeper: Session: 0x12f2497b00d03d9 closed
> 11/04/08 12:20:06 INFO zookeeper.ZooKeeperWrapper: Reconnecting to zookeeper
> 11/04/08 12:20:06 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:21811 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.z
> ookeeper.ZooKeeperWrapper@51127a
> 11/04/08 12:20:06 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
> 11/04/08 12:20:06 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11/04/08 12:20:06 WARN zookeeper.ZooKeeperWrapper: Problem getting stats for /hbase/rs
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
>        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
>        at org.apache.hadoop.hbase.client.HTable.getCurrentNrHRS(HTable.java:173)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:147)
>        at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:102)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.prefetchRegionCache(HConnectionManager.java:732)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:783)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:677)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:650)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:470)
>        at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
>        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1145)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:503)
>        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.getHBaseTimestamp(EtsAfsBuilder.java:215)
>        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.syncHour(EtsAfsBuilder.java:310)
>        at com.adobe.hs.ets.dozer.afs.EtsAfsBuilder.go(EtsAfsBuilder.java:130)
>        at BuildAfs.main(BuildAfs.java:43)
> 11/04/08 12:20:07 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
> 11/04/08 12:20:07 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 11/04/08 12:20:09 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:21811
> 11/04/08 12:20:09 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
>
> It then goes on to retry endlessly.  Killing the spinning job and running it again worked fine, so crashing would be preferable to me over retrying endlessly.
>
> I'm not especially concerned about what went wrong to cause ConnectionLoss in the first place, but I am interested in being able to set some behavior for handling the ZK exceptions elegantly.  For example, the call site in my code leading to the exception is this:
>
> Get get = new Get(Bytes.toBytes(level.rowKeyDateFormat.format(dts)));
> Result result = timestampsTable.get(get);
>
> I suppose this means that if I want to catch ConnectionLoss in my code, I have to wrap all my gets and puts with that catch block.  Or maybe just the first one?  It seems like HTable and friends might be able to catch this exception in a more central location, maybe somewhere in here:
>
> at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.getRSDirectoryCount(ZooKeeperWrapper.java:754)
>
> I'm running HBase 0.89.20100924+28.  Will this issue go away if I upgrade to a newer version?
>
> Thanks,
> Sandy
>