You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/06/30 00:55:28 UTC

[jira] [Created] (HBASE-4045) [replication] NPE in ReplicationSource when ZK is gone

[replication] NPE in ReplicationSource when ZK is gone
------------------------------------------------------

                 Key: HBASE-4045
                 URL: https://issues.apache.org/jira/browse/HBASE-4045
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.3
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
            Priority: Minor
             Fix For: 0.90.4


We got this in production, it killed the replication thread but the server itself was fine and the master kept the logs:

{quote}
2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26667ms for sessionid 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect
2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None, state=Disconnected, path=null
2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper, ignoring
2011-06-26 16:02:56,213 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's region server addresses
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228)
        at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26 16:02:56,222 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 5 because an error occurred: Uncaught exception during runtime
java.lang.Exception: java.lang.NullPointerException
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
        at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)

{quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4045) [replication] NPE in ReplicationSource when ZK is gone

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-4045.
---------------------------------------

    Resolution: Fixed

Committed the small fix to branch and trunk.

> [replication] NPE in ReplicationSource when ZK is gone
> ------------------------------------------------------
>
>                 Key: HBASE-4045
>                 URL: https://issues.apache.org/jira/browse/HBASE-4045
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4045.patch
>
>
> We got this in production, it killed the replication thread but the server itself was fine and the master kept the logs:
> {quote}
> 2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26667ms for sessionid 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None, state=Disconnected, path=null
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper, ignoring
> 2011-06-26 16:02:56,213 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's region server addresses
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26 16:02:56,222 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 5 because an error occurred: Uncaught exception during runtime
> java.lang.Exception: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
>         at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4045) [replication] NPE in ReplicationSource when ZK is gone

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-4045:
--------------------------------------

    Attachment: HBASE-4045.patch

Easy fix, instead of returning null in fetchSlavesAddresses I'll return an empty list.

> [replication] NPE in ReplicationSource when ZK is gone
> ------------------------------------------------------
>
>                 Key: HBASE-4045
>                 URL: https://issues.apache.org/jira/browse/HBASE-4045
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4045.patch
>
>
> We got this in production, it killed the replication thread but the server itself was fine and the master kept the logs:
> {quote}
> 2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26667ms for sessionid 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None, state=Disconnected, path=null
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper, ignoring
> 2011-06-26 16:02:56,213 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's region server addresses
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26 16:02:56,222 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 5 because an error occurred: Uncaught exception during runtime
> java.lang.Exception: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
>         at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4045) [replication] NPE in ReplicationSource when ZK is gone

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057585#comment-13057585 ] 

Hudson commented on HBASE-4045:
-------------------------------

Integrated in HBase-TRUNK #1998 (See [https://builds.apache.org/job/HBase-TRUNK/1998/])
    HBASE-4045  [replication] NPE in ReplicationSource when ZK is gone

jdcryans : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/trunk/CHANGES.txt


> [replication] NPE in ReplicationSource when ZK is gone
> ------------------------------------------------------
>
>                 Key: HBASE-4045
>                 URL: https://issues.apache.org/jira/browse/HBASE-4045
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.90.4
>
>         Attachments: HBASE-4045.patch
>
>
> We got this in production, it killed the replication thread but the server itself was fine and the master kept the logs:
> {quote}
> 2011-06-26 16:02:56,092 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26667ms for sessionid 0x22f9dcb30ab01b8, closing socket connection and attempting reconnect
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received ZooKeeper Event, type=None, state=Disconnected, path=null
> 2011-06-26 16:02:56,213 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: connection to cluster: 5-0x22f9dcb30ab01b8-0x22f9dcb30ab01b8 Received Disconnected from ZooKeeper, ignoring
> 2011-06-26 16:02:56,213 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Cannot get peer's region server addresses
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/rs
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:389)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndGetAsAddresses(ZKUtil.java:355)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.fetchSlavesAddresses(ReplicationZookeeper.java:228)
>         at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getSlavesAddresses(ReplicationZookeeper.java:216)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:205)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)2011-06-26 16:02:56,222 ERROR org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing source 5 because an error occurred: Uncaught exception during runtime
> java.lang.Exception: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$1.uncaughtException(ReplicationSource.java:628)
>         at java.lang.Thread.dispatchUncaughtException(Thread.java:1874)Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.chooseSinks(ReplicationSource.java:208)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:588)
>         at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:341)
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira