You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2009/04/05 21:47:12 UTC

[jira] Created: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
--------------------------------------------------------------

                 Key: HBASE-1311
                 URL: https://issues.apache.org/jira/browse/HBASE-1311
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Andrew Purtell


After about 12 hours of operation, this repeats over and over in the regionserver log:

2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by Ryan Rawson <ry...@gmail.com>.
Ill try it on my dev cluster soon

On May 10, 2009 4:07 PM, "Nitay Joffe (JIRA)" <ji...@apache.org> wrote:


   [
https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707853#action_12707853]

Nitay Joffe commented on HBASE-1311: ------------------------------------
Andrew, can you try the patch I posted? The test fails, because of
HBASE-1362, as I mentioned. It should work otherwise though. It'd be nice to
have some real cluster testing of it.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master >
--------------------------------...
>            Priority: Blocker

> Fix For: 0.20.0 >
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311.patch

> > > After about 12 hours of operation, this repeats over and over in the
regionserver log: > 2009-...

[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695903#action_12695903 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

OK, the dreaded session expired event on the regionserver. Seen this before on client side. Will look into it. Thanks.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1311:
-------------------------------

    Attachment: hbase-1311.patch

Here's what I've got so far.

In this patch:

ZooKeeper is the ground state of truth, so if we lose our connection to it,
then everyone thinks we're gone. So, we should act as such, which means
aborting and restarting.

I moved all of the state that has to be reinitialized into a new reinitialize()
method that is called by the constructor and my retart() method. It's rather
unfortunate that most of the things ended up moving into here (you can't call
run() on threads twice), so a lot of stuff is not final anymore.

I was seeing a problem with shutting down HDFS and starting it back up again,
so I added an AtomicBoolean to prevent the HDFS shutdown hook from running when
I restart.


I think I am now seeing the problem reported in HBASE-1362 when running the test in this patch.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708164#action_12708164 ] 

stack commented on HBASE-1311:
------------------------------

Current patch looks good.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696784#action_12696784 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

Fair enough, I'll wait for a Submit Patch or something next time :). Thanks anyways, it always helps to look at how others think about problems.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707012#action_12707012 ] 

stack commented on HBASE-1311:
------------------------------

Here is what I saw on master side:

{code}
2009-05-07 07:40:05,597 [RegionManager.rootScanner-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: SyncConnected, type: None, path: null
2009-05-07 07:40:07,555 [RegionManager.rootScanner] INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {regionname: -ROOT-,,0, startKey: <>, server: 208.76.44.142:60021} complete
2009-05-07 07:40:19,983 [IPC Server handler 6 on 60000] DEBUG org.apache.hadoop.hbase.master.ServerManager: Total Load: 1399, Num Servers: 3, Avg Load: 467.0
2009-05-07 07:40:27,089 [main-EventThread] INFO org.apache.hadoop.hbase.master.ServerManager: aa0-000-15.u.powerset.com_1241637114572_60021 znode expired
2009-05-07 07:40:27,120 [HMaster] DEBUG org.apache.hadoop.hbase.master.HMaster: Processing todo: ProcessServerShutdown of aa0-000-15.u.powerset.com_1241637114572_60021
2009-05-07 07:40:27,120 [HMaster] INFO org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of server aa0-000-15.u.powerset.com_1241637114572_60021: logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1
2009-05-07 07:40:27,125 [HMaster] INFO org.apache.hadoop.hbase.regionserver.HLog: Splitting 3 log(s) in hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/@LOGS@/aa0-000-15.u.powerset.com_1241637114572_60021
2009-05-07 07:40:27,126 [HMaster] DEBUG org.apache.hadoop.hbase.regionserver.HLog: Splitting 1 of 3: hdfs://aa0-000-12.u.powerset.com:9000/hbasetrunk2/@LOGS@/aa0-000-15.u.powerset.com_1241637114572_60021/hlog.dat.1241676102482
2009-05-07 07:40:30,022 [main-EventThread] INFO org.apache.hadoop.hbase.master.ServerManager: aa0-000-14.u.powerset.com_1241637114470_60021 znode expired
2009-05-07 07:40:34,779 [IPC Server handler 7 on 60000] INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000, call regionServerReport(address: 208.76.44.141:60021, startcode: 1241637114470, load: (requests=0, regions=466, usedHeap=861, maxHeap=1255), [Lorg.apache.hadoop.hbase.HMsg;@191ea7e5, [Lorg.apache.hadoop.hbase.HRegionInfo;@2ba0b845) from 208.76.44.141:5009
4: error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:201)
        at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:604)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:642)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:911)
2009-05-07 07:40:34,792 [IPC Server handler 4 on 60000] INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000, call regionServerReport(address: 208.76.44.141:60021, startcode: 1241637114470, load: (requests=0, regions=466, usedHeap=861, maxHeap=1255), [Lorg.apache.hadoop.hbase.HMsg;@5b7836c8, [Lorg.apache.hadoop.hbase.HRegionInfo;@3154b362) from 208.76.44.141:5019
7: error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:201)
        at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:604)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:642)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:911)
2009-05-07 07:40:34,801 [IPC Server handler 9 on 60000] INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000, call regionServerReport(address: 208.76.44.141:60021, startcode: 1241637114470, load: (requests=0, regions=466, usedHeap=862, maxHeap=1255), [Lorg.apache.hadoop.hbase.HMsg;@2fcd003b, [Lorg.apache.hadoop.hbase.HRegionInfo;@3bca3a01) from 208.76.44.141:5019
7: error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:201)
        at org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:604)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:642)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:911)
{code}

Above repeats over and over.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707103#action_12707103 ] 

ryan rawson commented on HBASE-1311:
------------------------------------

I see this as well, once my cluster gets under load this can happen on a regular basis.



> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1311:
-------------------------------

    Attachment: hbase-1311-v2.patch

Updated patch with working test. I think this should fix HBASE-1362 as well.

Here is a snippet from Stack describing the proposed fix to the problem:

{quote}
It looks like RootScanner notices that server is null on the .META. row because it says that .META. region is not valid.  This should mean this is called:

        this.master.regionManager.setUnassigned(info, true);

Line 391 of BaseScanner (run by RootScanner)

That should get it reassigned.... sometime.... in fact from your log, it does get assigned later...

 2009-05-09 15:22:19,261 INFO  [IPC Server handler 1 on 60000] master.RegionManager(319): Assigning region .META.,,1 to localhost_1241907737069_52172


So, I think you need to get the metaTableAvailable thing to trigger.  Perhaps do the compare of the numbers but also check if the region is in transition... regionIsInTransition in RegionManager.  You can get the param to pass by doing Bytes.toString on the MetaRegion name. 
{quote}

In my new RegionManager.metaRegionsInTransition method:
- Does onlineMetaRegions need to be synchronized?
- Does regionIsInTransition need to be synchronized?

I was following what I saw in other places that use those data structures, but I'm not clear on why one is synchronized yet not the other.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708200#action_12708200 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

See HBASE-1406 for refactoring HRS issue.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696763#action_12696763 ] 

Andrew Purtell commented on HBASE-1311:
---------------------------------------

Hi Nitay. I did try to indicate my patch was a dumb hack. :-) Just for illustration to show what worked for me for 1311 while tinkering around....

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1311:
----------------------------------

    Priority: Blocker  (was: Major)

This and HBASE-1315 are the only evident cause of instability on my test cluster. Raising priority to blocker.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696730#action_12696730 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

Hi Andrew,

I like the idea of the reinit() with the ephemeral node map.

However this patch fundamentally changes a lot of things in the design of ZooKeeperWrapper. The initial idea was not to have any retries in ZooKeeperWrapper so that each user of it can handle it differently. Each ZooKeeper operation was supposed to either succeed or fail simply and the code calling it would do what it needs.

Take a look at how I handled SessionExpired in the client, TableServers. I think it is much cleaner to have each ZooKeeper user (TableServers, HRegionServer, and HMaster) register itself as a watcher and handle SessionExpired for itself. SessionExpired is not something that is particular to a single operation, so littering every ZooKeeper call with it seems a bit much?

Thoughts?

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707859#action_12707859 ] 

Andrew Purtell commented on HBASE-1311:
---------------------------------------

Testing now.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work started: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HBASE-1311 started by Nitay Joffe.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707002#action_12707002 ] 

stack commented on HBASE-1311:
------------------------------

Had same thing happen to me last night:

{code}
2009-05-07 07:39:29,348 [main-EventThread] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:39:29,368 [regionserver/0:0:0:0:0:0:0:0:60021.worker-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:39:33,249 [regionserver/0:0:0:0:0:0:0:0:60021.worker-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: SyncConnected, type: None, path: null
2009-05-07 07:39:33,250 [main-EventThread] DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master address ZNode /hbase/master
2009-05-07 07:39:33,250 [main-EventThread] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: None, path: null
2009-05-07 07:39:33,251 [main-EventThread] DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master address ZNode /hbase/master
2009-05-07 07:39:49,858 [regionserver/0:0:0:0:0:0:0:0:60021.worker-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:39:50,048 [main-EventThread] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:39:55,318 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:350)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:346)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
2009-05-07 07:39:55,319 [main-EventThread] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.
2009-05-07 07:40:00,479 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:350)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:346)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
2009-05-07 07:40:00,480 [main-EventThread] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.
2009-05-07 07:40:05,643 [main-EventThread] DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master address ZNode /hbase/master
2009-05-07 07:40:05,643 [main-EventThread] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: None, path: null
2009-05-07 07:40:05,644 [main-EventThread] DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Set watcher on master address ZNode /hbase/master
2009-05-07 07:40:05,721 [regionserver/0:0:0:0:0:0:0:0:60021.worker-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: SyncConnected, type: None, path: null
2009-05-07 07:40:19,108 [main-EventThread] INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:40:19,188 [regionserver/0:0:0:0:0:0:0:0:60021.worker-EventThread] DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Got ZooKeeper event, state: Disconnected, type: None, path: null
2009-05-07 07:40:24,218 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:350)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:346)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
2009-05-07 07:40:24,218 [main-EventThread] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.
2009-05-07 07:40:29,328 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:350)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:346)
....
{code}

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695917#action_12695917 ] 

Andrew Purtell commented on HBASE-1311:
---------------------------------------

Thanks Nitay. I can run long duration tests when you have something. 

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1311:
-------------------------------

    Attachment: dump.txt

Here's a dump of running the test.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696631#action_12696631 ] 

Andrew Purtell edited comment on HBASE-1311 at 4/7/09 10:23 AM:
----------------------------------------------------------------

See attached hack of ZooKeeperWrapper to reinitialize the zookeeper handle and recreate ephemeral nodes. This works to address HBASE-1311 but the actions that run from watchers when they see the ephemeral znodes go away because of session expiration (as opposed to real process/node failures) lead to problems like HBASE-1314 and HBASE-1315. 

      was (Author: apurtell):
    See attached hack of ZooKeeperWrapper to reinitialize the zookeeper handle and recreate ephemeral nodes. This works to address HBASE-1311 but the actions that run from watchers when they see the ephemeral znodes go away lead to problems like HBASE-1314 and HBASE-1315. 
  
> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1311:
-------------------------------

    Fix Version/s: 0.20.0

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1311:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed.  Nitay opening separate issue to refactor HRS.  Thanks for the patch Nitay.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708118#action_12708118 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

Yes, that's probably a good idea as we can then keep all the final modifiers. I'll work on this in a third version of the patch. Folks are welcome to test this patch out on their clusters though -- it should have all the logic necessary for the fix.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707853#action_12707853 ] 

Nitay Joffe commented on HBASE-1311:
------------------------------------

Andrew, can you try the patch I posted? The test fails, because of HBASE-1362, as I mentioned. It should work otherwise though. It'd be nice to have some real cluster testing of it.

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707909#action_12707909 ] 

stack commented on HBASE-1311:
------------------------------

Nitay, should HRS host a thread that does all the work?  Then when it dies or when you need to restart, we just wait on old one to die then create a new one?

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707004#action_12707004 ] 

stack commented on HBASE-1311:
------------------------------

Then it started to do this:

{code}
2009-05-07 07:40:57,771 [regionserver/0:0:0:0:0:0:0:0:60021.compactor] DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 1 file(s), hasReferences=true, into /hbasetrunk2/TestTable/compaction.dir/807729785793552806
2009-05-07 07:40:59,738 [main-EventThread] WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:350)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:346)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
2009-05-07 07:40:59,739 [main-EventThread] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.
2009-05-07 07:41:00,553 [regionserver/0:0:0:0:0:0:0:0:60021] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message (Retry: 0)
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:500)
        at java.lang.Thread.run(Thread.java:619)
2009-05-07 07:41:00,569 [regionserver/0:0:0:0:0:0:0:0:60021] WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message (Retry: 1)
org.apache.hadoop.hbase.Leases$LeaseStillHeldException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
        at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:500)
....
{code}

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe reassigned HBASE-1311:
----------------------------------

    Assignee: Nitay Joffe

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated HBASE-1311:
-------------------------------

    Status: Patch Available  (was: In Progress)

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-1311:
----------------------------------

    Attachment: dumb-wrapper-hack.patch

See attached hack of ZooKeeperWrapper to reinitialize the zookeeper handle and recreate ephemeral nodes. This works to address HBASE-1311 but the actions that run from watchers when they see the ephemeral znodes go away lead to problems like HBASE-1314 and HBASE-1315. 

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>         Attachments: dumb-wrapper-hack.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1311) ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708161#action_12708161 ] 

stack commented on HBASE-1311:
------------------------------

See RegionServerThread in LocalHBaseCluster.  Might be of help (This wraps HRS in a Thread whereas you want to put thread inside HRS -- but might help some)?

> ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> --------------------------------------------------------------
>
>                 Key: HBASE-1311
>                 URL: https://issues.apache.org/jira/browse/HBASE-1311
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Andrew Purtell
>            Assignee: Nitay Joffe
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: dumb-wrapper-hack.patch, dump.txt, hbase-1311-v2.patch, hbase-1311.patch
>
>
> After about 12 hours of operation, this repeats over and over in the regionserver log:
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set watcher on ZNode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:709)
> 	at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.watchMasterAddress(ZooKeeperWrapper.java:235)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:343)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.process(HRegionServer.java:339)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:366)
> 2009-04-05 19:44:38,445 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to set watcher on ZooKeeper master address. Retrying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.