You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Thomas Vinod Johnson (JIRA)" <ji...@apache.org> on 2008/12/08 22:30:44 UTC
[jira] Created: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
NullPointerException stopping and starting Zookeeper servers
------------------------------------------------------------
Key: ZOOKEEPER-251
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.0.1, 3.0.0
Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
Reporter: Thomas Vinod Johnson
See the following thread for the original report:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
Steps to reproduce:
1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
java.lang.NullPointerException
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
at
org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
at
org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
at
org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
2008-12-08 14:14:24,880 - INFO
[QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Forcing shutdown
at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
java.lang.NullPointerException
at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
The inputStream field is null, apparently because next is being called
at line 358 even after next returns false. Having very little knowledge
about the implementation, I don't know if the existence of hdr.getZxid()
>= zxid is supposed to be an invariant across all invocations of the
server; however the following change to FileTxnLog.java seems to make
the problem go away.
diff FileTxnLog.java /tmp/FileTxnLog.java
358c358,359
< next();
---
> if (!next())
> return;
447c448,450
< inputStream.close();
---
> if (inputStream != null) {
> inputStream.close();
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Fix Version/s: 3.1.0
Affects Version/s: (was: 3.1.0)
3.0.0
3.0.1
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (ZOOKEEPER-251) NullPointerException
stopping and starting Zookeeper servers
Posted by "Thomas Vinod Johnson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654811#action_12654811 ]
vinodjohnson edited comment on ZOOKEEPER-251 at 12/9/08 6:47 AM:
-------------------------------------------------------------------------
Thank you. Post patch, I don't see the exceptions. To clear up confusion on my part, when you say "the situation arises after switching from quorum to standalone", you are referring to the single server that is stopped and started, and not the ensemble as a whole? I believe that throughout the test, I was maintaining quorum by having at least 2 out of 3 servers running and communicating to each other.
was (Author: vinodjohnson):
Thank you. Post patch, I don't see the exceptions. To clear up confusion on my part, when you say "the situation arises after switching from quorum to standalone", you are referring to the single server that is stopped and started, correct and not the ensemble as a whole? I believe that throughout the test, I was maintaining quorum by having at least 2 out of 3 servers running and communicating to each other.
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Status: Patch Available (was: Open)
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.1, 3.0.0
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch, ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Attachment: ZOOKEEPER-251.patch
can you try this patch out thomas?
adding a test might be a little hard since the situation arises after switching from quorum to standalone. let me try and see if I can reproduce this as a junit test...
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Hunt updated ZOOKEEPER-251:
-----------------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
+1 - looks good, tests pass.
Committed revision 725454.
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch, ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655603#action_12655603 ]
Hudson commented on ZOOKEEPER-251:
----------------------------------
Integrated in ZooKeeper-trunk #169 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/169/])
. NullPointerException stopping and starting Zookeeper servers
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch, ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Assignee: Mahadev konar
Affects Version/s: (was: 3.0.1)
(was: 3.0.0)
3.1.0
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.1.0
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654618#action_12654618 ]
Mahadev konar commented on ZOOKEEPER-251:
-----------------------------------------
ill file a patch shortly with a testcase.... thanks thomas
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Attachment: ZOOKEEPER-251.patch
this patch adds a test. IT was difficult to create a test that can recreate the problem. The attaches test sometimes fails without the patch but passes the tests with the patch.
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch, ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654627#action_12654627 ]
Mahadev konar commented on ZOOKEEPER-251:
-----------------------------------------
also this does not seem to happen in a quorum... i think i know the reason why ... but will update the jira with my findings...
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Thomas Vinod Johnson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654811#action_12654811 ]
Thomas Vinod Johnson commented on ZOOKEEPER-251:
------------------------------------------------
Thank you. Post patch, I don't see the exceptions. To clear up confusion on my part, when you say "the situation arises after switching from quorum to standalone", you are referring to the single server that is stopped and started, correct and not the ensemble as a whole? I believe that throughout the test, I was maintaining quorum by having at least 2 out of 3 servers running and communicating to each other.
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654583#action_12654583 ]
Patrick Hunt commented on ZOOKEEPER-251:
----------------------------------------
Looks to me like the issue is in the while loop:
goToNextLog();
if (!next())
return;
while (hdr.getZxid() < zxid) {
next(); // ISSUE IS HERE
}
we aren't checking the next return value, we will call next again, which will fail if next previously returned false (ia will not be null but inputstream might be)
The question in my mind is what Thomas asked - why is hdr zxid < zxid and next returning false? Is this ok, or signalling a problem elsewhere?
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mahadev konar updated ZOOKEEPER-251:
------------------------------------
Priority: Blocker (was: Major)
i found the problem. The problem occurs when the snapshots are well ahead of the logs. That would be the case when the server is brought up and down and there are not trasactions on it. So there are no new logs to be applied to the snapshot. This is due to the bug that
{noformat}
while (hdr.getZxid() < zxid) {
next())
{noformat}
does not check the value of next()
and also
next() itself does not keep ia and inputstream syncrhonous.
{noformat}
} catch (EOFException e) {
LOG.info("EOF exception ", e);
inputStream.close();
inputStream = null;
// thsi means that the file has ended
// we shoud go to the next file
{noformat}
should be
{noformat}
} catch (EOFException e) {
LOG.info("EOF exception ", e);
inputStream.close();
inputStream = null;
ia = null;
// thsi means that the file has ended
// we shoud go to the next file
{noformat}
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-251) NullPointerException stopping and
starting Zookeeper servers
Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654870#action_12654870 ]
Mahadev konar commented on ZOOKEEPER-251:
-----------------------------------------
sorry i must have been confused. I though the servers were started as quorum and then one of them pulled out of quorum and started as a standalone server. This will actually create the problem quote easily. The 3 quorum servers and killing one at a time is another way to do it (now that i realize it).
> NullPointerException stopping and starting Zookeeper servers
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-251
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-251
> Project: Zookeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.0.0, 3.0.1
> Environment: Tested with JDK 1.5, Solaris, but I suspect it is not relevant in this case.
> Reporter: Thomas Vinod Johnson
> Assignee: Mahadev konar
> Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: ZOOKEEPER-251.patch
>
>
> See the following thread for the original report:
> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/200812.mbox/browser
> Steps to reproduce:
> 1) Start a replicated zookeeper service consisting of 3 zookeeper (3.0.1) servers all running on the same host (of course, all using their own ports and log directories)
> 2) Create one znode in this ensemble (using the zookeeper client console, I issued 'create /node1 node1data').
> 3) Stop, then restart a single zookeeper server; moving onto the next one a few seconds later.
> 4) Go back to 3. After 4-5 iterations, the following should occur, with the failing server exiting:
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:447)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:358)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:333)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:250)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:102)
> at
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:183)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:245)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:421)
> 2008-12-08 14:14:24,880 - INFO
> [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Leader@336] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:336)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> Exception in thread "QuorumPeer:/0:0:0:0:0:0:0:0:2183"
> java.lang.NullPointerException
> at
> org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:339)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:427)
> The inputStream field is null, apparently because next is being called
> at line 358 even after next returns false. Having very little knowledge
> about the implementation, I don't know if the existence of hdr.getZxid()
> >= zxid is supposed to be an invariant across all invocations of the
> server; however the following change to FileTxnLog.java seems to make
> the problem go away.
> diff FileTxnLog.java /tmp/FileTxnLog.java
> 358c358,359
> < next();
> ---
> > if (!next())
> > return;
> 447c448,450
> < inputStream.close();
> ---
> > if (inputStream != null) {
> > inputStream.close();
> > }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.