You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Jeremy Stribling (JIRA)" <ji...@apache.org> on 2011/04/13 00:26:05 UTC

[jira] [Created] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Creating a new sequential node results in a ZNODEEXISTS error
-------------------------------------------------------------

                 Key: ZOOKEEPER-1046
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.3.2, 3.3.3
         Environment: A 3 node-cluster running Debian squeeze.
            Reporter: Jeremy Stribling


On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:

{noformat}
[zk:<ip:port>(CONNECTED) 3] stat /zkrsm
cZxid = 0x5
ctime = Mon Jan 17 18:28:19 PST 2011
mZxid = 0x5
mtime = Mon Jan 17 18:28:19 PST 2011
pZxid = 0x1d819
cversion = 120710
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2955
{noformat}

However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.

In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.

Here's the scenario:

# There's a cluster with nodes 1,2,3 using zxid 0x3.
# All three nodes restart, forming a cluster of zxid 0x4.
# Node 3 restarts, leading to a cluster of 0x5.

At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:

{noformat}
2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
{noformat}

This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.

I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.

Some oddities you might notice in these logs:
* Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
* We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
{noformat}
2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
{noformat}
* There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019657#comment-13019657 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

Thanks for looking into.  I will grab all the data I can the next time I see this happen, but it's difficult since I've never had it happen on my own setup, only when others report it to me, and by then the data may have been rotated away.

I haven't found a test to reproduce this yet, but I will try to come up with one when I get some time.



> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020490#comment-13020490 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

No, it'll need to be fixed in the DataTree deleteNode method somehow. I'm not really sure the best way to do it yet. But an easy manual test to run is, with the log and snapshot files attached to this ticket, if you start up a zk with the version-2 directory containing just log.1, the cversion of /zkrsm is 84340. If you start it up with log.1 and snapshot.1460d, the cversion of /zkrsm is 84245. If you fix this bug, both should show cversion of 84340.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023368#comment-13023368 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

Hi Jeremy, Camille,

The patch that Jeremy attached will fix the problem, but it will result in incrementing cversion() for every delete operation that fails with NoNodeException. That is, an attempt to delete a node that does not exist will also increment the cversion. Am I correct?

If yes, then do you think it would be better to change {quote}public void deleteNode(String path, long zxid){quote} to {quote}public void deleteNode(String path, long zxid, boolean forceIncrementCversion){quote}.

If forceIncrementCversion is set, the code will work similar to Jeremy's patch. ZkDataBase.processTxn() will always set the flag to false. FileTxnSnapLog.processTransaction() will always set the flag to true. We will also need to add third argument to DataTree.processTxn().

Questions for Camille:
- Why does this have to be in DataTree.deleteNode() and cannot be part of exception handling at the higher layer? Isn't this a special case for Txn exception handling rather than deleting the node from DT?
- You mentioned that your test resulted in overwriting the nodes instead of causing NODEXIST. Is this because of the same bug?

Thanks.


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053412#comment-13053412 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Ah, here is the error:

zookeeperbuildcontrib.compile:
     [echo] contrib: loggraph
    [javac] Compiling 34 source files to /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/contrib/loggraph/classes
    [javac] /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/TxnLogSource.java:185: deserializeTxn(byte[],org.apache.zookeeper.txn.TxnHeader) in org.apache.zookeeper.server.util.SerializeUtils cannot be applied to (org.apache.jute.InputArchive,org.apache.zookeeper.txn.TxnHeader)
    [javac] 		Record r = SerializeUtils.deserializeTxn(iab, hdr);
    [javac] 		                         ^
    [javac] /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/TxnLogSource.java:333: deserializeTxn(byte[],org.apache.zookeeper.txn.TxnHeader) in org.apache.zookeeper.server.util.SerializeUtils cannot be applied to (org.apache.jute.InputArchive,org.apache.zookeeper.txn.TxnHeader)
    [javac] 		Record r = SerializeUtils.deserializeTxn(iab, hdr);
    [javac] 		                         ^
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
    [javac] 2 errors



> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020503#comment-13020503 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

Thanks Camille.  The patch I just attached seems to solve the problem for the log/snap combo you point out above.  I haven't tested it extensively yet, and it's not very elegant so you probably want to rewrite it and I'm not going to put this into "patch available" yet.  But if you have a chance, please let me know if there's anything obviously wrong in this patch.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044331#comment-13044331 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

I think I got the code into a 3.3 patch. If someone could take a look and ok it I will push it back into the branch.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020441#comment-13020441 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

This one is a bear. I believe what is happening is the following:

Your code is creating and deleting large numbers of sequential nodes. At time T, it is in the process of deleting a bunch of nodes when ZK decides to take a snapshot of the state.

When we take a snapshot, we spawn a separate thread and serialize the nodes of the tree in that thread. We get into your /zkrsm node in DataTree.serializeNode, get that node from the tree, synchronize on it, and write out the record of that node including its current cversion (used to generate sequential node information) and the list of children. However, we then release the sync on that node, and attempt to iterate through the children to serialize them out. In the meantime, the other thread is merrily deleting children of this node, increasing the cversion of /zkrsm all the while. So the list of children that we got while serializing the parent is defunct. When we try to serialize these now-deleted children, we see that they are null and continue on.

Now, you finish this snapshot, delete some more nodes under /zkrsm, create some more sequential nodes under /zkrsm, and crash. When you start back up again, you read that snapshot and start playing through the log transactions after the snapshot zxid. Unfotunately, the first N transactions in your log after the snapshot zxid are deletions of nodes that didn't make it into the snapshot because you deleted them before they could be serialized to the snapshot. We will try to process the delete transaction and get a NoNodeException, but ignore it because we know that can happen due to what I wrote above. But what we don't do is increment the cversion of the parent node after this failed deletion. So our parent's cversion is less than the version it would be if you played just the transaction log through, or of the system before the crash. Now you want to continue creating sequential nodes where you left off, but your cversion is wrong so you try to create a node that already exists. Whoops.

So, now we just need to fix it. Should we be incrementing the cversion of the parent even on a NoNode exception during txn log replay? I suspect that is the right thing to do. Thoughts?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046893#comment-13046893 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Do we want to try to do this as part of 3.4? Are you guys addressing this in some of your other patches?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: ZOOKEEPER-1046-for333

3.3 branch patch

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1046:
-------------------------------------

    Fix Version/s: 3.4.0

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-1046:
--------------------------------

    Attachment: ZOOKEEPER-1046-6.patch


Attaching patch for trunk. For some reason some of the core tests are failing for me (and also on hudson). The test that is relevant to this patch is passing.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Stribling updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: ZOOKEEPER-1046.patch

A quick and dirty patch that seems to fix the problem.  Applies from src/ with `patch -p1`.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020505#comment-13020505 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

(By the way, the patch is against the 3.3.3 release, not the trunk.)

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020447#comment-13020447 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

Wow, nice sleuthing.  Sounds very plausible -- we are doing a lot of deletions under that same node, and it could definitely be concurrent with the node creation.  But not knowing the code very well, I'll defer to others for confirmation.

I will try to implement this change locally for our app and see if we run into the problem again.  I assume you're talking about the OpCode.delete case in PrepRequestProcessor::pRequest?  And copying the cversion increment into a catch clause after the getRecordForPath(path) call?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050070#comment-13050070 ] 

Patrick Hunt commented on ZOOKEEPER-1046:
-----------------------------------------

I strongly oppose changing the semantics of cversion. Certainly not in 3.3, as part of a minor upgrade perhaps. While we (the core code) only use it for sequence numbering, there is no telling what users are using it for, and switching it out from under them is a very bad idea IMO. It may lead to hard to track down issues for users, users who appreciate our rock steady api.

Can we use Flavio's suggestion to:
1) do what Ben is suggesting, fix cversion in StatPersisted to only track creates
2) update the cversion in Stat as Flavio is suggesting before passing to the user

this would maintain the semantics on the client side. Downside is that it's a bit of a hack (not sure if it would even work, given we have to translate in both directions, to/from user). How bad I don't know - thoughts?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K reopened ZOOKEEPER-1046:
---------------------------------


Reopening this bug based on my previous comment. 


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049774#comment-13049774 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

It's ok with me to make the change and ignore deletes.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060760#comment-13060760 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

Hi Ben, Camille,

Sorry for the delayed response. I don't think this patch fixes the bug entirely.

{quote}
--- src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java	(revision 1138571)
+++ src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java	(working copy)
[...]
+        if ((hdr.getType() == OpCode.create &&
+                rc.err == Code.NODEEXISTS.intValue()) &&
+                ((CreateTxn)txn).getParentCVersion() == -1) { 
{quote}

The if condition above is applied only when we are using older version of CreateTxn. Don't we need the part that sets the value for cversion when ((CreateTxn)txn).getParentCVersion() is > 0? Looks like we need an else here that sets parent cversion to the cversion in createTxn. Am I missing something?

-Vishal

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046720#comment-13046720 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

yes, it would require a transaction log and protocol upgrade.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031711#comment-13031711 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Hey Vishal,

Any chance you could get some sort of test for this soon? It looks like it will be a merge conflict with ZOOKEEPER-965 so it would be good to have this in first since it is a rather nasty bug and I'd like to make sure we don't make the same mistake in the multi-txn space.

Thanks.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039388#comment-13039388 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

So I'm trying to get this patch working in 3.3, and I notice the following:

One of the tests in the patch is a modification of LoadFromLogTest, which was introduced to show fixes for ZOOKEEPER-882. ZOOKEEPER-882 claims to have been fixed on 3.3.3 and 3.4, but the test is not there in the 3.3.3 branch. When we port fixes from trunk to branches, are we not also porting the tests?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049814#comment-13049814 ] 

Flavio Junqueira commented on ZOOKEEPER-1046:
---------------------------------------------

If cversion counts the number of created children, we can always learn the number of deleted children by subtracting the number of current children from cversion, no? I was also wondering if there is any use case you're aware of in which it needs to have both counted. 

So far the proposal of counting only creations seems good to me.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053345#comment-13053345 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Looked at the patch. Besides a couple extraneous printlns it looks ok to me. Shall I remove these myself and commit this to trunk?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053371#comment-13053371 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

oops, i missed those. yeah if you could remove and commit that would be great. btw, does the 3.3 patch still need to go in?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019092#comment-13019092 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

The logs are too big to attach, but you can download them from [http://pdos.csail.mit.edu/~strib/zk_node_exists.tgz] [24 MB, 670 MB unzipped].

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037302#comment-13037302 ] 

Hudson commented on ZOOKEEPER-1046:
-----------------------------------

Integrated in ZooKeeper-trunk #1189 (See [https://builds.apache.org/hudson/job/ZooKeeper-trunk/1189/])
    ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1125581
Files : 
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/DataTreeTest.java


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048693#comment-13048693 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

OK. So who wants to do this? Should I pull this patch out of trunk and only apply it to 3.3? I'd like to get this cleared up and finalized so we know the bug and the related bugs are all fixed.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-1046:
--------------------------------

    Attachment: ZOOKEEPER-1046.patch1

+ tests

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053425#comment-13053425 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483488/zookeeper-1046-4.patch
  against trunk revision 1138213.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 30 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/348//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020145#comment-13020145 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Thanks Jeremy.
Today I ran the single node test with your log above and got even stranger results; instead of giving me NODEEXISTS errors I was simply overwriting nodes. This was through a java ZooKeeper instead of zkCli. Going to look into it further now.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053457#comment-13053457 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483497/zookeeper-1046-5.patch
  against trunk revision 1138595.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 35 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/349//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050558#comment-13050558 ] 

Patrick Hunt commented on ZOOKEEPER-1046:
-----------------------------------------

@breed yet more debt for those looking for things to complain about. ;-) Still, I'm willing to take the heat if it's going to benefit users. My suggestion would be to document it in multiple places - in the jute definition file, and in the method you mention. Reference this JIRA in those comments.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049197#comment-13049197 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Is this going to force me to write some kind of upgrade script for the ZK servers I have running 3.3.3?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050090#comment-13050090 ] 

Patrick Hunt commented on ZOOKEEPER-1046:
-----------------------------------------

Ok, thanks for the clarification. In that case what do you think about this for 3.4+ ? Is it going to be possible to do this right, but also w/o too much overhead? (ie simply) vs the gains of changing the API? 3.4 I'm less worried about the semantic change, but I'd still like to avoid it if reasonably possible...

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049817#comment-13049817 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

nice observation flavio! i haven't seen anyone using cversion outside of the sequence number on sequence znodes.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037626#comment-13037626 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Patrick, I missed your comment in my rushing on Friday. Do we want to push this into the 3.3 branch as well?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053005#comment-13053005 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483386/ZOOKEEPER-1046_2.patch
  against trunk revision 1138213.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 30 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/345//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/345//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/345//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: zookeeper-1046-4.patch

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036998#comment-13036998 ] 

Patrick Hunt commented on ZOOKEEPER-1046:
-----------------------------------------

It's not clear to me at this point, is this being fixed on 3.3 and trunk? ("fix for" indicates trunk only) Which patch is which? (typically we indicate in the patch file name if it's for a branch, vs the "simple" name if for trunk)


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064002#comment-13064002 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

Hi Camille,

Thanks. I am ok with that change. I am hoping it won't be too much of an effort for you as last time to merge this in 3.3.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053379#comment-13053379 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Checked in 3.3 last thursday (1136440)

Should we be worried about the -1 javac from jenkins? I don't know what that error means since clearly this thing can compile if it can pass tests.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037683#comment-13037683 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

yes, we should get this into the 3.3 branch.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046704#comment-13046704 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

Does this imply a transaction log version upgrade?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019661#comment-13019661 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

I was wrong -- I was able to find the log/snapshot files from a run with Zookeeper 3.3.2 (single node).  If you start up Zookeeper 3.3.2 with this as your data dir, you can reproduce the problem with the following command in zkCli:

{noformat}
# create -s /zkrsm/000000000000002d_record test
Node already exists: /zkrsm/000000000000002d_record
{noformat}

Unfortunately, I do not have logs for the run that led to this state, I only have logs from after the restart, which shouldn't give you any more info than just running with these snapshots yourself.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064829#comment-13064829 ] 

Mahadev konar commented on ZOOKEEPER-1046:
------------------------------------------

camille/vishal, 
  Are we waiting for this to be ported to 3.3 branch before resolving this? 

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated ZOOKEEPER-1046:
-------------------------------------

    Priority: Blocker  (was: Major)

making a blocker to ensure that it gets into 3.4.0

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049181#comment-13049181 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

yes, i think we should stick with this patch for 3.3. i think it might be better to do a patch based on what is already in trunk rather than reverting. we should definitely get the protocol change into 3.4.0. i'll try to get to it today.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: zookeeper-1046-5.patch

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060846#comment-13060846 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

I think you are right, Vishal, good catch. The perils of making changes without reproducible tests! Do you want to submit a fix for it?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032777#comment-13032777 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

I will upload a test soon.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064839#comment-13064839 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

I'm waiting on the build to run successfully before I resolve it.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vishal K updated ZOOKEEPER-1046:
--------------------------------

    Attachment: ZOOKEEPER-1046.patch

Hi Camille, Jeremy,

I have attached a patch that I think should fix the problem that Camille
pointed out.

There are subtle effects of this patch that needs to be carefully
reviewed:

1. Previously, DataTree.processTxn() did not return error if
create/delete failed with nodeexist/nonode exception (the if condition
was a bit strange also). I think we should not do this. Per my
understanding, such a failure can occur only when we are replaying the
transactions. It should not occur when the server is doing live
transactions (unless there is a bug). So processTxn should return
error even in this case to the caller. The caller should decided
whether to ignore these errors or not. In FileTxnSnapLog we handle the
error by incrementing cversion of the parent. In FinalRequestProcessor
we do not do anything special, but we do not ignore the error as well.

2. We could increment cversion in DeleteNode itself (as I mentioned in
my previous comment). But I think changes in the attached patch is a
better approach. I would like to confirm whether not incrementing
cversion in DeleteNode() is going to introduce any synchronization
issue (since the lock on the parent will be released and then
reacquired). I don't think there is any problem here, but it will be
good if you can confirm this.

I noticed that the same bug can occur while creating a node as
well. Patch handles create and delete.

I was able to test this patch using the snapshots and logs that Jermey
had attached. I had to revert the logs to the state where this bug
would be seen the first time. I ended up doing: rm \*.1d8\* \*.1d9\*

Then I can see that "create -s /zkrsm/000000000000002d_record test"
succeeds with this patch (and fails without).

I have not thought about writing a test for this yet (sounds complex to me).

Thanks.
-Vishal


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated ZOOKEEPER-1046:
-------------------------------------

    Attachment: ZOOKEEPER-1046_2.patch

this patch makes the CreateTxn idempotent and uses camille's workaround with older txns.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053942#comment-13053942 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

committed to trunk: 
New Revision: 1138957
New Revision: 1138958 (jute file)

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063952#comment-13063952 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Looks great Vishal, thanks for seeing this through. Only question is whether we want to increase the log level to info from debug in the case that we see the NODEEXISTS failure in FileTxnSnapLog. I suspect we want to leave it at debug, if you agree I'll just correct that and check it in.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019398#comment-13019398 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

Correct.  Sorry, forgot to mention that.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019357#comment-13019357 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

You saw this problem in the past in a single node setup, correct?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046596#comment-13046596 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

i was reviewing this code again, and i think we still have a problem. the txns are supposed to be idempotent, but the way we maintain the cversion is not idempotent. i think it is possible for different replicas to have different cversions for the same znode.

i think the ultimate fix is to encode the cversion of the parent into the create and delete txns.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050078#comment-13050078 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

two clarifying points:

* this is not for 3.3. this would be a 3.4 change. we will stick with camille's fix for 3.3

* we never get the cversion from the user. you can't do conditional ops with it or pass it in any of the calls.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned ZOOKEEPER-1046:
---------------------------------------

    Assignee: Vishal K

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063609#comment-13063609 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12486118/ZOOKEEPER-1046-6.patch
  against trunk revision 1144087.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 16 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/389//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/389//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/389//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033249#comment-13033249 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Looks great Vishal! +1 from me

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053399#comment-13053399 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483478/zookeeper-1046-3
  against trunk revision 1138213.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 30 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/347//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030050#comment-13030050 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Hey Vishal,

Looking at the patch at a glance it looks good to me. Would be nice to have just a flat unit test against the DataTree incrementCversion. Is there any way to take those logs that Jeremy provided and turn them into a test for this fix? It would be somewhat heavy I'll admit but it's a somewhat significant bug and I think it warrants a good test.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064025#comment-13064025 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Nah, since it only applies to the 3.4 release (ben's change is just for 3.4), it is no effort at all. Thanks!

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Camille Fournier updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: zookeeper-1046-3

removed printlns

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044332#comment-13044332 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12481458/ZOOKEEPER-1046-for333
  against trunk revision 1125581.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 14 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/305//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033049#comment-13033049 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12479099/ZOOKEEPER-1046.patch1
  against trunk revision 1099329.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 16 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/263//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/263//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/263//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064844#comment-13064844 ] 

Mahadev konar commented on ZOOKEEPER-1046:
------------------------------------------

great, thanks!

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053656#comment-13053656 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

+1 good find. sorry i missed the contrib.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Vishal K (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061339#comment-13061339 ] 

Vishal K commented on ZOOKEEPER-1046:
-------------------------------------

I can work on the patch. I will see if we can have a test that fails without the patch.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Stribling updated ZOOKEEPER-1046:
----------------------------------------

    Attachment: ZOOKEEPER-1046.tgz

The log and snapshot files for an occurrence of this with a single node using ZK 3.3.2.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050222#comment-13050222 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

actually, flavio's suggestion can go in quite well. we track the cversion in the persistentStat and there is one function that fills in Stat and we can do the fixup there: cversion*2 - numChildren. it will work and it isn't really that messy. the only thing i don't like is that it might confuse core developers to see a different cversion on disk than they see coming back from requests.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023466#comment-13023466 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

I think it could and possibly should be in the exception handling at the higher layer. I haven't had much of a chance to look through it in details yet. I think there's definitely still some work to be done here, if you have a better idea it would be cool to see it.

Which test are you referring to? I realized after posting that one of my tests was writing to a different node name entirely instead of overwriting a node name so that was a human operator problem.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050431#comment-13050431 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

I checked in the fix for the 3.3 branch
svn commit: r1136440




> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065177#comment-13065177 ] 

Hudson commented on ZOOKEEPER-1046:
-----------------------------------

Integrated in ZooKeeper-trunk #1243 (See [https://builds.apache.org/job/ZooKeeper-trunk/1243/])
    ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1146025
Files : 
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/DataTreeTest.java


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037047#comment-13037047 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

+1 great job figuring out the problem and fixing it! looks good to me.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046899#comment-13046899 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

i 3.4 would be a good time to do it since we are doing a small protocol change in the initial handshake.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019656#comment-13019656 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

So, from a first pass through the code, I'm not sure how this is happening. It looks like the parent cversion is always updated when you add a child node, and that is reflected in both the data tree that would be serialized and the transaction log replay (which itself would update the parent node cversion). It would be nice if we could get the log. and snapshot. files from the zk data directories the next time you see this. Can you force it to happen more often by updating your cluster more frequently or bouncing nodes?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053007#comment-13053007 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

To be fair, I only found the bug, Vishal K wrote the fix. Will take a look at this tomorrow am.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064555#comment-13064555 ] 

Camille Fournier commented on ZOOKEEPER-1046:
---------------------------------------------

Checked in Vishal's fix: r1146025 - in /zookeeper/trunk: ./ src/java/main/org/apache/zookeeper/server/ src/java/main/org/apache/zookeeper/server/persistence/ src/java/test/org/apache/zookeeper/test/

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Andrei Savu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020237#comment-13020237 ] 

Andrei Savu commented on ZOOKEEPER-1046:
----------------------------------------

I've seen a similar behavior while writing a strictly increasing sequence of numbers as sequential nodes. You can find the code at the following link [1]. The repository also contains some basic tools for fault injection.

[1] https://github.com/andreisavu/zookeeper-mq

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Jeremy Stribling (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020139#comment-13020139 ] 

Jeremy Stribling commented on ZOOKEEPER-1046:
---------------------------------------------

For the record, I have another example of this happening, this time without a lot complicated cluster dynamics.  The scenario is:

# There is a 3-node cluster
# A sequential node is created with seq. num. X
# The leader dies very shortly after that (within a couple of transactions)
# A new leader is elected (who has seen all of the transactions committed by the previous leader)
# The new leader immediately hits the problem where it returns ZNODEEXISTS when a seq. node is returned

I have logs from before and after the problem, as well as Zookeeper snapshots that contain the problem, but unfortunately the transaction log containing the actual problematic transaction was rotated away.  If anyone wants to see what I do have, let me know and I'll upload them somewhere.

I will also try to write a simple test that creates lots of sequential nodes, kills the leader, and sees if the problem occurs.


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049607#comment-13049607 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

it turns out that the fix is really easy and small and easy to make backwards compatible except for one killer problem: closeSession. The closeSession may do a massive amount of deletes, so potentially we would need to go through and figure out everything to delete and send a cversion for each.

i've been looking and i think we can simplify things. cversion is only used to pick suffixes for sequential nodes, so we really only need to increment on creates. if we did this, the fix would really become trivial and would also make the numbering of creates sequential.

of course this is the only way we use it, others may use it for something else. it would be an api semantic break.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054361#comment-13054361 ] 

Hudson commented on ZOOKEEPER-1046:
-----------------------------------

Integrated in ZooKeeper-trunk #1223 (See [https://builds.apache.org/job/ZooKeeper-trunk/1223/])
    ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error
ZOOKEEPER-1046: Creating a new sequential node results in a ZNODEEXISTS error

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1138958
Files : 
* /zookeeper/trunk/src/zookeeper.jute

camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1138957
Files : 
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/SerializationPerfTest.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/DeserializationPerfTest.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/util/SerializeUtils.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/DataTreeTest.java
* /zookeeper/trunk/src/contrib/loggraph/src/java/org/apache/zookeeper/graph/TxnLogSource.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ReadOnlyModeTest.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataNode.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/LogFormatter.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Follower.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/DataTreeUnitTest.java
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/upgrade/UpgradeSnapShotV1.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Observer.java


> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049288#comment-13049288 ] 

Benjamin Reed commented on ZOOKEEPER-1046:
------------------------------------------

no, i think we need to back it backwards compatible with 3.3.x, so we need to be able to process txns without the parent cversion. so, no upgrade needed.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139007#comment-13139007 ] 

Ted Dunning commented on ZOOKEEPER-1046:
----------------------------------------

Camille, 

I think you are right.  The code for multi needs to handle whatever it has to in order to guarantee idempotency.

                
> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal Kher
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.3.4, 3.4.0
>
>         Attachments: ZOOKEEPER-1046-6.patch, ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055028#comment-13055028 ] 

Mahadev konar commented on ZOOKEEPER-1046:
------------------------------------------

Camille/Ben, did you forget to resolve thei jira? Is this already committed to 3.4 release?

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>            Assignee: Vishal K
>            Priority: Blocker
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046-for333, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch1, ZOOKEEPER-1046.tgz, ZOOKEEPER-1046_2.patch, zookeeper-1046-3, zookeeper-1046-4.patch, zookeeper-1046-5.patch
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1046) Creating a new sequential node results in a ZNODEEXISTS error

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030009#comment-13030009 ] 

Hadoop QA commented on ZOOKEEPER-1046:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12478444/ZOOKEEPER-1046.patch
  against trunk revision 1099329.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/251//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/251//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/251//console

This message is automatically generated.

> Creating a new sequential node results in a ZNODEEXISTS error
> -------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1046
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1046
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.2, 3.3.3
>         Environment: A 3 node-cluster running Debian squeeze.
>            Reporter: Jeremy Stribling
>              Labels: sequence
>             Fix For: 3.4.0
>
>         Attachments: ZOOKEEPER-1046.patch, ZOOKEEPER-1046.patch, ZOOKEEPER-1046.tgz
>
>
> On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible.  In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent.  In one example:
> {noformat}
> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
> cZxid = 0x5
> ctime = Mon Jan 17 18:28:19 PST 2011
> mZxid = 0x5
> mtime = Mon Jan 17 18:28:19 PST 2011
> pZxid = 0x1d819
> cversion = 120710
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 0
> numChildren = 2955
> {noformat}
> However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.
> In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA.  The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.
> Here's the scenario:
> # There's a cluster with nodes 1,2,3 using zxid 0x3.
> # All three nodes restart, forming a cluster of zxid 0x4.
> # Node 3 restarts, leading to a cluster of 0x5.
> At this point, it seems like node 1 is the leader of the 0x5 epoch.  In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:
> {noformat}
> 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
> {noformat}
> This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries.  The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.
> I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc.  Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.
> Some oddities you might notice in these logs:
> * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code.  (They are assigned randomly, but are supposed to be consistent across restarts.)
> * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process).  This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
> {noformat}
> 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner  - shutdown called
> {noformat}
> * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed.  I'm not sure how this would be related though, as no transactions are done during this period.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira