You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2009/10/20 20:07:59 UTC

[jira] Updated: (HBASE-1921) When the Master's session times out and there's only one, cluster is wedged

     [ https://issues.apache.org/jira/browse/HBASE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-1921:
--------------------------------------

    Attachment: HBASE-1921.patch

Patch that does what I described and here's what you will see when it happens:

{code}2009-10-20 10:53:38,708 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None with path null
2009-10-20 10:53:39,997 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server /10.10.1.58:2181
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.1.58:56099 remote=/10.10.1.58:2181]
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-10-20 10:53:40,000 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x12472fd41f10004 to sun.nio.ch.SelectionKeyImpl@2afb6c5f
java.io.IOException: Session Expired
	at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
	at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2009-10-20 10:53:40,000 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None with path null
2009-10-20 10:53:40,000 INFO org.apache.hadoop.hbase.master.HMaster: Master lost its znode, trying to get a new one
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x12472fd41f10004
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session: 0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ZooKeeper: Session: 0x12472fd41f10004 closed
2009-10-20 10:53:40,001 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with ZooKeeper
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=10.10.1.58:2181 sessionTimeout=60000 watcher=Thread[HMaster,5,main]
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server /10.10.1.58:2181
2009-10-20 10:53:40,005 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.10.1.58:56100 remote=/10.10.1.58:2181]
2009-10-20 10:53:40,006 INFO org.apache.zookeeper.ClientCnxn: Server connection successful
2009-10-20 10:53:40,009 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event None with path null
2009-10-20 10:53:40,012 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Wrote master address 10.10.1.58:60000 to ZooKeeper
2009-10-20 10:53:40,016 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master got 10.10.1.58:60000
2009-10-20 10:53:40,017 DEBUG org.apache.hadoop.hbase.master.HMaster: Checking cluster state...
2009-10-20 10:53:40,017 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 10.10.1.58:60020
2009-10-20 10:53:40,019 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/rs/1256061062528 got 10.10.1.58:60020
2009-10-20 10:53:40,019 INFO org.apache.hadoop.hbase.master.HMaster: This is a failover, ZK inspection begins...
2009-10-20 10:53:40,020 DEBUG org.apache.hadoop.hbase.master.HMaster: Inspection found server 10.10.1.58
2009-10-20 10:53:40,022 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode /hbase/rs/1256061062528 with data 10.10.1.58:60020
2009-10-20 10:53:40,028 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: SetData of ZNode /hbase/root-region-server with 10.10.1.58:60020
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Inspection found 3 regions, with -ROOT-
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Found log folder : 10.10.1.58,60020,1256061062528
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Log folder belongs to an existing region server
2009-10-20 10:53:40,029 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2009-10-20 10:54:38,601 INFO org.apache.hadoop.hbase.master.ServerManager: 1 region servers, 0 dead, average load 3.0
2009-10-20 10:54:38,602 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scanning meta region {server: 10.10.1.58:60020, regionname: -ROOT-,,0, startKey: <>}
2009-10-20 10:54:38,607 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.10.1.58:60020, regionname: .META.,,1, startKey: <>}
2009-10-20 10:54:38,611 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner scan of 1 row(s) of meta region {server: 10.10.1.58:60020, regionname: -ROOT-,,0, startKey: <>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 1 row(s) of meta region {server: 10.10.1.58:60020, regionname: .META.,,1, startKey: <>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s) scanned
{code}

> When the Master's session times out and there's only one, cluster is wedged
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1921
>                 URL: https://issues.apache.org/jira/browse/HBASE-1921
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.2, 0.21.0
>
>         Attachments: HBASE-1921.patch
>
>
> On IRC, some fella had a session expiration on his Master and had only one. Maybe in this case the Master should first try to re-get the znode?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.