You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Pavel Moravec <pm...@redhat.com> on 2014/03/23 12:52:31 UTC

Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/
-----------------------------------------------------------

Review request for qpid, Gordon Sim and Kim van der Riet.


Bugs: QPID-5642
    https://issues.apache.org/jira/browse/QPID-5642


Repository: qpid


Description
-------

Elegant (but not performance optimal) way of patch:

1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.

Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.

However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.

I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.

Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?

Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?


Diffs
-----

  /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1577475 
  /trunk/qpid/cpp/src/qpid/broker/MessageStore.h 1577475 
  /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.h 1577475 
  /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.cpp 1577475 
  /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.h 1577475 
  /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp 1577475 
  /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1577475 
  /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1577475 
  /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1577475 
  /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1577475 
  /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.h 1577475 
  /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.cpp 1577475 

Diff: https://reviews.apache.org/r/19566/diff/


Testing
-------

- reproducer from JIRA verified
- automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)


Thanks,

Pavel Moravec

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.


> On March 26, 2014, 1:59 p.m., Chug Rolke wrote:
> > The broker currently has a concept of a boot sequence number, which is incremented each time that qpidd restarts. This number is not the boot sequence number of the host computer.
> > 
> >  $ ./qpidd --log-enable debug+
> >      ... debug ManagementAgent boot sequence: 44
> > 
> >  $ ./qpidd --log-enable debug+
> >      ... debug ManagementAgent boot sequence: 45
> > 
> > The exchange Message sequence number gets reset to zero at each qpidd restart.
> > 
> > Trying to track each message sequence number through nonvolatile storage is a huge load. Also, it is likely that there are reliability problems tracking the sequence numbers through a broker crash or restart, just when the sequence numbers are critical.
> > 
> > What about extending the concept of a "sequence number" to include "<broker boot sequence>:<exchange Message sequence number>"?
> > 
> > These numbers exist today so there's no change to generate them. The new scheme would be a way to pack both numbers into a message instead of just the Message sequence number.

I like that idea.


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38601
-----------------------------------------------------------


On March 23, 2014, 11:52 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 23, 2014, 11:52 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.h 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.cpp 1577475 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Chug Rolke <cr...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38601
-----------------------------------------------------------


The broker currently has a concept of a boot sequence number, which is incremented each time that qpidd restarts. This number is not the boot sequence number of the host computer.

 $ ./qpidd --log-enable debug+
     ... debug ManagementAgent boot sequence: 44

 $ ./qpidd --log-enable debug+
     ... debug ManagementAgent boot sequence: 45

The exchange Message sequence number gets reset to zero at each qpidd restart.

Trying to track each message sequence number through nonvolatile storage is a huge load. Also, it is likely that there are reliability problems tracking the sequence numbers through a broker crash or restart, just when the sequence numbers are critical.

What about extending the concept of a "sequence number" to include "<broker boot sequence>:<exchange Message sequence number>"?

These numbers exist today so there's no change to generate them. The new scheme would be a way to pack both numbers into a message instead of just the Message sequence number.

- Chug Rolke


On March 23, 2014, 11:52 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 23, 2014, 11:52 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.h 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.cpp 1577475 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38533
-----------------------------------------------------------


Ultimately the question as to the acceptability of the performance impact is something only users can answer. However the penalty would be high as the update is synchronous and all bdb transactions will be serialised.

For a per-message operation such as this, a more performant solution would be to use the mechanism used for recording enqueues. Perhaps for example a special queue could be used onto which messages representing the changes to the sequence number could be enqueued (and the message representing the old value then dequeued, like with an LVQ). On recovery the last sequence would be recovered from the special queue and used to update the exchange.

This is somewhat half-baked, but might be an interesting angle to explore some more as a simple(ish) solution.

- Gordon Sim


On March 23, 2014, 11:52 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 23, 2014, 11:52 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.h 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.cpp 1577475 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Alan Conway <ac...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >
> 
> Pavel Moravec wrote:
>     This seems to be sound to me. To sum it up:
>     - exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
>       - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
>     - when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
>       - increment qpid.next_sequenceNo by 64k
>       - update store bdb for the exchange
>       - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
>     - when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
>     - when promoting a node to active:
>       - set sequenceNo = args(qpid.next_sequenceNo)
>       - execute the same 3 steps from "when incrementing sequenceNo"
>     
>     Two questions:
>     1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
>     2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?
>     
>     
>     Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.
>     
>     
>     Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".
>
> 
> Alan Conway wrote:
>     Notifying backups of changes is done via QMF events, so a broker can always emit a "nextSequenceNo changed" event and if there are backups listening they will pick it up. 
>     
>     So to your questions:
>     
>     1) A node being promoted can emit QMF events but there is not a guaranteed ordering that the Exchange response will be seen before any QMF events for that exchange. This is a real race condition, I have made fixes for other special cases of this problem. Either we can make another special case, or I have an idea for a more general fix for the problem. Let me look at this, for your work assume that it will be OK to send a nextSequenceNo event during update. I'll update this review when I have a patch for the race condition.
>     
>     2) The reason that I propose updating the nextSequenceNo when we get within 32k of the current is to ensure we have time to write the disk & do the updates before the actual sequence number is used. E.g. say nextSeqNo=128, actualSeqNo=96k. At this point we will update nextSeqNo to 140k. I am assuming that we will not receive 32k new messages before we can write this to disk, so even if we crash before the write, with nextSeqNo=128 still on disk, it will still be higher than any actual sequence number we have sent. I am also assuming that the nextSeqNo update QMF event will be processed by backups before we receive 32k new messages, to ensure that if we fail the new primary will have a sufficient nextSeqNo. We could increase the safety margin at the cost of decreasing our jumps-to-overflow.
>     
>     The alternative to a "safety margin" would be to effectively lock the Exchange while we are updating the nextSeqNo. on disk and to the cluster. That might be ok for performance since we don't do it often. However it would be hard to implement without deadlocks, especially the cluster update part which would also require a new feedback mechanism in the cluster. I prefer the safety margin approach as being simpler provided the margin is generous enough.
>     
>     
>
> 
> Pavel Moravec wrote:
>     Re 2) I meant scenario when a broker is being promoted. Then it sets actualSeqNo=nextSeqNo; nextSeqNo+=64k; (and writes it to the disk and notifies backups). Until the broker writes the update to the disk, there is in fact no safety margin available. But to really hit duplicate sequence number being set to a message, _all_ brokers in the cluster would have to be killed before writing the updated nextSequenceNo to the disk - low probable event we can live with, in my opinion.
> 
> Alan Conway wrote:
>     We can solve this by making sure a newly promoted primary does not go READY until all nextSeqNo have been written to disk and all exchanges have been replicated with the updated nextSeqNo. That's not the current behavior, we only wait for queues to be ready, but I'll fix that.
> 
> Pavel Moravec wrote:
>     Hi Alan,
>     is the above fix implemented?

No, fell way down my priority list. I have some notes if it is time to revisit it. Requires some changes to the replication sequence for newcomers but they would be healthy changes in general, not just a hack for this case.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >

This seems to be sound to me. To sum it up:
- exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
  - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
- when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
  - increment qpid.next_sequenceNo by 64k
  - update store bdb for the exchange
  - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
- when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
- when promoting a node to active:
  - set sequenceNo = args(qpid.next_sequenceNo)
  - execute the same 3 steps from "when incrementing sequenceNo"

Two questions:
1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?


Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.


Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >
> 
> Pavel Moravec wrote:
>     This seems to be sound to me. To sum it up:
>     - exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
>       - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
>     - when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
>       - increment qpid.next_sequenceNo by 64k
>       - update store bdb for the exchange
>       - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
>     - when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
>     - when promoting a node to active:
>       - set sequenceNo = args(qpid.next_sequenceNo)
>       - execute the same 3 steps from "when incrementing sequenceNo"
>     
>     Two questions:
>     1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
>     2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?
>     
>     
>     Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.
>     
>     
>     Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".
>
> 
> Alan Conway wrote:
>     Notifying backups of changes is done via QMF events, so a broker can always emit a "nextSequenceNo changed" event and if there are backups listening they will pick it up. 
>     
>     So to your questions:
>     
>     1) A node being promoted can emit QMF events but there is not a guaranteed ordering that the Exchange response will be seen before any QMF events for that exchange. This is a real race condition, I have made fixes for other special cases of this problem. Either we can make another special case, or I have an idea for a more general fix for the problem. Let me look at this, for your work assume that it will be OK to send a nextSequenceNo event during update. I'll update this review when I have a patch for the race condition.
>     
>     2) The reason that I propose updating the nextSequenceNo when we get within 32k of the current is to ensure we have time to write the disk & do the updates before the actual sequence number is used. E.g. say nextSeqNo=128, actualSeqNo=96k. At this point we will update nextSeqNo to 140k. I am assuming that we will not receive 32k new messages before we can write this to disk, so even if we crash before the write, with nextSeqNo=128 still on disk, it will still be higher than any actual sequence number we have sent. I am also assuming that the nextSeqNo update QMF event will be processed by backups before we receive 32k new messages, to ensure that if we fail the new primary will have a sufficient nextSeqNo. We could increase the safety margin at the cost of decreasing our jumps-to-overflow.
>     
>     The alternative to a "safety margin" would be to effectively lock the Exchange while we are updating the nextSeqNo. on disk and to the cluster. That might be ok for performance since we don't do it often. However it would be hard to implement without deadlocks, especially the cluster update part which would also require a new feedback mechanism in the cluster. I prefer the safety margin approach as being simpler provided the margin is generous enough.
>     
>     
>
> 
> Pavel Moravec wrote:
>     Re 2) I meant scenario when a broker is being promoted. Then it sets actualSeqNo=nextSeqNo; nextSeqNo+=64k; (and writes it to the disk and notifies backups). Until the broker writes the update to the disk, there is in fact no safety margin available. But to really hit duplicate sequence number being set to a message, _all_ brokers in the cluster would have to be killed before writing the updated nextSequenceNo to the disk - low probable event we can live with, in my opinion.
> 
> Alan Conway wrote:
>     We can solve this by making sure a newly promoted primary does not go READY until all nextSeqNo have been written to disk and all exchanges have been replicated with the updated nextSeqNo. That's not the current behavior, we only wait for queues to be ready, but I'll fix that.

Hi Alan,
is the above fix implemented?


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Alan Conway <ac...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >
> 
> Pavel Moravec wrote:
>     This seems to be sound to me. To sum it up:
>     - exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
>       - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
>     - when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
>       - increment qpid.next_sequenceNo by 64k
>       - update store bdb for the exchange
>       - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
>     - when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
>     - when promoting a node to active:
>       - set sequenceNo = args(qpid.next_sequenceNo)
>       - execute the same 3 steps from "when incrementing sequenceNo"
>     
>     Two questions:
>     1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
>     2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?
>     
>     
>     Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.
>     
>     
>     Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".
>

Notifying backups of changes is done via QMF events, so a broker can always emit a "nextSequenceNo changed" event and if there are backups listening they will pick it up. 

So to your questions:

1) A node being promoted can emit QMF events but there is not a guaranteed ordering that the Exchange response will be seen before any QMF events for that exchange. This is a real race condition, I have made fixes for other special cases of this problem. Either we can make another special case, or I have an idea for a more general fix for the problem. Let me look at this, for your work assume that it will be OK to send a nextSequenceNo event during update. I'll update this review when I have a patch for the race condition.

2) The reason that I propose updating the nextSequenceNo when we get within 32k of the current is to ensure we have time to write the disk & do the updates before the actual sequence number is used. E.g. say nextSeqNo=128, actualSeqNo=96k. At this point we will update nextSeqNo to 140k. I am assuming that we will not receive 32k new messages before we can write this to disk, so even if we crash before the write, with nextSeqNo=128 still on disk, it will still be higher than any actual sequence number we have sent. I am also assuming that the nextSeqNo update QMF event will be processed by backups before we receive 32k new messages, to ensure that if we fail the new primary will have a sufficient nextSeqNo. We could increase the safety margin at the cost of decreasing our jumps-to-overflow.

The alternative to a "safety margin" would be to effectively lock the Exchange while we are updating the nextSeqNo. on disk and to the cluster. That might be ok for performance since we don't do it often. However it would be hard to implement without deadlocks, especially the cluster update part which would also require a new feedback mechanism in the cluster. I prefer the safety margin approach as being simpler provided the margin is generous enough.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >
> 
> Pavel Moravec wrote:
>     This seems to be sound to me. To sum it up:
>     - exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
>       - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
>     - when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
>       - increment qpid.next_sequenceNo by 64k
>       - update store bdb for the exchange
>       - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
>     - when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
>     - when promoting a node to active:
>       - set sequenceNo = args(qpid.next_sequenceNo)
>       - execute the same 3 steps from "when incrementing sequenceNo"
>     
>     Two questions:
>     1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
>     2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?
>     
>     
>     Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.
>     
>     
>     Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".
>
> 
> Alan Conway wrote:
>     Notifying backups of changes is done via QMF events, so a broker can always emit a "nextSequenceNo changed" event and if there are backups listening they will pick it up. 
>     
>     So to your questions:
>     
>     1) A node being promoted can emit QMF events but there is not a guaranteed ordering that the Exchange response will be seen before any QMF events for that exchange. This is a real race condition, I have made fixes for other special cases of this problem. Either we can make another special case, or I have an idea for a more general fix for the problem. Let me look at this, for your work assume that it will be OK to send a nextSequenceNo event during update. I'll update this review when I have a patch for the race condition.
>     
>     2) The reason that I propose updating the nextSequenceNo when we get within 32k of the current is to ensure we have time to write the disk & do the updates before the actual sequence number is used. E.g. say nextSeqNo=128, actualSeqNo=96k. At this point we will update nextSeqNo to 140k. I am assuming that we will not receive 32k new messages before we can write this to disk, so even if we crash before the write, with nextSeqNo=128 still on disk, it will still be higher than any actual sequence number we have sent. I am also assuming that the nextSeqNo update QMF event will be processed by backups before we receive 32k new messages, to ensure that if we fail the new primary will have a sufficient nextSeqNo. We could increase the safety margin at the cost of decreasing our jumps-to-overflow.
>     
>     The alternative to a "safety margin" would be to effectively lock the Exchange while we are updating the nextSeqNo. on disk and to the cluster. That might be ok for performance since we don't do it often. However it would be hard to implement without deadlocks, especially the cluster update part which would also require a new feedback mechanism in the cluster. I prefer the safety margin approach as being simpler provided the margin is generous enough.
>     
>     
>

Re 2) I meant scenario when a broker is being promoted. Then it sets actualSeqNo=nextSeqNo; nextSeqNo+=64k; (and writes it to the disk and notifies backups). Until the broker writes the update to the disk, there is in fact no safety margin available. But to really hit duplicate sequence number being set to a message, _all_ brokers in the cluster would have to be killed before writing the updated nextSequenceNo to the disk - low probable event we can live with, in my opinion.


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Alan Conway <ac...@redhat.com>.


> On April 1, 2014, 9:17 p.m., Alan Conway wrote:
> > Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?
> > 
> > I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.
> > 
> > That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.
> > 
> > Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...
> > 
> > 
> >
> 
> Pavel Moravec wrote:
>     This seems to be sound to me. To sum it up:
>     - exchanges with sequencing to have new argument qpid.next_sequenceNo, by default set to 64k for new exchanges, or to the value read from store during startup (or being updated from primary broker)
>       - if the argument is missing in stored args, it will be automatically added there (to support upgrades)
>     - when incrementing sequenceNo in preRoute, if args(qpid.next_sequenceNo) - sequenceNo == 32k, then execute 3 steps:
>       - increment qpid.next_sequenceNo by 64k
>       - update store bdb for the exchange
>       - if clustered broker and (primary or being promoted), notify backups about the increment of qpid.next_sequenceNo
>     - when joining a cluster, active broker will update the joiner one automatically (via exchange arguments) - no need to change
>     - when promoting a node to active:
>       - set sequenceNo = args(qpid.next_sequenceNo)
>       - execute the same 3 steps from "when incrementing sequenceNo"
>     
>     Two questions:
>     1) Can a node being promoted already update backups? Or is there a way of postponing the update? (see latest point above)
>     2) What if - after a new promotion - the broker sends few messages with sequenceNo say 128k but dies before it really writes to the disk args(qpid.next_sequenceNo)=(128+64)k ? Can it even happen?
>     
>     
>     Sequence number will then jump ahead by at most 96k whenever a standalone broker is restarted or new promotion in cluster happens (or cold start of cluster happens). That means, there can be 93824992236885 such "jump events" without sequence number overflow (or 93824671436104 "jump events" for steadily message flow of 1M messages per second for 10 years). More than fair enough.
>     
>     
>     Comment to "Are we sure nobody assumes that exchange sequence numbers are sequential?" - Consumers get messages from queues and no queue can be guaranteed to get _all_ messages routed by the exchange (until all know the exchange is fanout or binding key is empty or so - but I think nobody can build on this premise). So jumps in exchange sequence numbers can be always meant as "the skipped messages were not meant for this queue".
>
> 
> Alan Conway wrote:
>     Notifying backups of changes is done via QMF events, so a broker can always emit a "nextSequenceNo changed" event and if there are backups listening they will pick it up. 
>     
>     So to your questions:
>     
>     1) A node being promoted can emit QMF events but there is not a guaranteed ordering that the Exchange response will be seen before any QMF events for that exchange. This is a real race condition, I have made fixes for other special cases of this problem. Either we can make another special case, or I have an idea for a more general fix for the problem. Let me look at this, for your work assume that it will be OK to send a nextSequenceNo event during update. I'll update this review when I have a patch for the race condition.
>     
>     2) The reason that I propose updating the nextSequenceNo when we get within 32k of the current is to ensure we have time to write the disk & do the updates before the actual sequence number is used. E.g. say nextSeqNo=128, actualSeqNo=96k. At this point we will update nextSeqNo to 140k. I am assuming that we will not receive 32k new messages before we can write this to disk, so even if we crash before the write, with nextSeqNo=128 still on disk, it will still be higher than any actual sequence number we have sent. I am also assuming that the nextSeqNo update QMF event will be processed by backups before we receive 32k new messages, to ensure that if we fail the new primary will have a sufficient nextSeqNo. We could increase the safety margin at the cost of decreasing our jumps-to-overflow.
>     
>     The alternative to a "safety margin" would be to effectively lock the Exchange while we are updating the nextSeqNo. on disk and to the cluster. That might be ok for performance since we don't do it often. However it would be hard to implement without deadlocks, especially the cluster update part which would also require a new feedback mechanism in the cluster. I prefer the safety margin approach as being simpler provided the margin is generous enough.
>     
>     
>
> 
> Pavel Moravec wrote:
>     Re 2) I meant scenario when a broker is being promoted. Then it sets actualSeqNo=nextSeqNo; nextSeqNo+=64k; (and writes it to the disk and notifies backups). Until the broker writes the update to the disk, there is in fact no safety margin available. But to really hit duplicate sequence number being set to a message, _all_ brokers in the cluster would have to be killed before writing the updated nextSequenceNo to the disk - low probable event we can live with, in my opinion.

We can solve this by making sure a newly promoted primary does not go READY until all nextSeqNo have been written to disk and all exchanges have been replicated with the updated nextSeqNo. That's not the current behavior, we only wait for queues to be ready, but I'll fix that.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Alan Conway <ac...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39200
-----------------------------------------------------------

Sorry, catching up. I think this patch is abusing the bootsequence which really has nothing at all to do with exchange sequence numbers. It would work on a standalone (but still be an abuse IMO) broker but it doesn't work for HA. The base for exchage sequence numbers has to be synchronized across a HA cluster, whereas the number of times each broker has rebooted *cannot* be synchronized without breaking its existing meaning as a reboot count. I'm also a bit wary of the division of sequence number into 17 bits for "restart" count 47 bits for "message" count. What are the grounds for that division? Is it safe to assume no overflows?

I see the goal of avoiding persisting the sequence number on every message, but how about persisting it (per exchange) on some suitably large number of messages? E.g. persist an initial next-sequence of 64k for each exchange, with the sequence number starting at 1. Once the sequence number gets within 32k of next-sequence we increase next-sequence of by 64k etc. On reboot or on promotion of the primary we jump each exchange sequence number ahead to the exchange's next-sequence number.

That lets us take smaller jumps on each restart/failover (16 bits vs. 47 bits) and avoids using a single counter for every exchange which makes overflow less likely. The next-sequence number for each exchange would need to be persisted, replicated on joining (both can be done by putting it back in the args) and replicated via a management event when it is increased on the primary. Jumping the next-sequence number ahead when we still have 32k messages left to go means that we have time to store & replicate before we pass the new limit. The next-sequence doesn't have to be precisely synchronized in a cluster provided it is always a good margin bigger than the highest sequence number used by that exchange on the primary so far. During promotion (before allowing backups to join) we would need to jump each exchange sequence ahead to the next-sequence for the exchange and increase next-sequence by 64k to make sure the new primary is using higher sequence numbers than the old.

Are we sure nobody assumes that exchange sequence numbers are sequential? The above is fine for duplicate detection but are we sure nobody does missed message detection by checking for gaps in the exchange sequence numbers? That would complicate everything quite a bit...

/trunk/qpid/cpp/src/qpid/broker/Exchange.cpp
<https://reviews.apache.org/r/19566/#comment71559>

    Why 47? Magic numbers are bad, suggest making this a constant with a comment to justify the value.

    Why is bootsequence involved here? What does the number of times the broker has rebooted have to do with the sequence number on an individual exchange?

- Alan Conway

On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/
-----------------------------------------------------------

(Updated April 1, 2014, 10:45 a.m.)


Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.


Bugs: QPID-5642
    https://issues.apache.org/jira/browse/QPID-5642


Repository: qpid


Description
-------

Elegant (but not performance optimal) way of patch:

1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.

Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.

However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.

I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.

Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?

Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?


Diffs
-----

  /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 

Diff: https://reviews.apache.org/r/19566/diff/


Testing
-------

- reproducer from JIRA verified
- automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)


Thanks,

Pavel Moravec

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Chug Rolke <cr...@redhat.com>.


> On March 31, 2014, 2:13 p.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp, line 210
> > <https://reviews.apache.org/r/19566/diff/4/?file=539793#file539793line210>
> >
> >     The reason the value was originally set in args was so that it would be replicated in old cluster.
> >     
> >     A question would be how the new ha handles sequencing. In the ideal world the boot sequence of a new master would always be one higher than any previous master. However that is really orthogonal to this issue.
> >     
> >     Maybe add a comment around the likelhood of wraparound be considered remote enough that it is not handled?
> >     
> >     Looks good to me though.
> 
> Pavel Moravec wrote:
>     Good point with new HA cluster. I see two possible ways how to get boot sequence lower within the cluster:
>     
>     1) A user deletes <data-dir>/.mbrokerdata file, thus resetting boot sequence to 1. I spotted this user case sometimes when troubleshooting / attempting to resolve broker startup problems - let start broker with "clean table" if it helps.
>     
>     2) Assume 2node cluster with brokers A (primary) and B (backup). Restarting B five times sets B boot sequence to 6, then restarting A once sets its boot sequence to 2. Now B is the primary broker. Then restarting B causes primary is A again, but with lower boot sequence.
>     
>     I tried to come up with a mechanism of updating boot sequence during a new joiner recovery, but there are two contradicting requirements:
>     a) every new joiner has to have bigger boot sequence than the current primary node - to ensure failover to this new joiner from primary will not lower "boot_sequence<<47 + exchange.seqeuenceNo" number. So any two new joiners B and C should have boot sequence bigger than primary A.
>     b) But also B should have boot sequence bigger than C (if it will become primary after A) while C should have it bigger than B for the same reason.
>     
>     The only way is maintaining boot sequence cluster-wide on the same value, and during promoting _any_ broker, the number is incremented by one - cluster wide. (and of course, during recovery, it is updated to new joiner).
>     
>     Alan, is the above paragraph (easily) feasible? (if so I can implement it)

If the boot sequence is incremented during any broker promotion then having only 16 bits for the boot sequence is probably too small.


- Chug


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39038
-----------------------------------------------------------


On April 1, 2014, 10:45 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated April 1, 2014, 10:45 a.m.)
> 
> 
> Review request for qpid, Alan Conway, Gordon Sim, and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.

> On March 31, 2014, 2:13 p.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp, line 210
> > <https://reviews.apache.org/r/19566/diff/4/?file=539793#file539793line210>
> >
> >     The reason the value was originally set in args was so that it would be replicated in old cluster.
> >     
> >     A question would be how the new ha handles sequencing. In the ideal world the boot sequence of a new master would always be one higher than any previous master. However that is really orthogonal to this issue.
> >     
> >     Maybe add a comment around the likelhood of wraparound be considered remote enough that it is not handled?
> >     
> >     Looks good to me though.

Good point with new HA cluster. I see two possible ways how to get boot sequence lower within the cluster:

1) A user deletes <data-dir>/.mbrokerdata file, thus resetting boot sequence to 1. I spotted this user case sometimes when troubleshooting / attempting to resolve broker startup problems - let start broker with "clean table" if it helps.

2) Assume 2node cluster with brokers A (primary) and B (backup). Restarting B five times sets B boot sequence to 6, then restarting A once sets its boot sequence to 2. Now B is the primary broker. Then restarting B causes primary is A again, but with lower boot sequence.

I tried to come up with a mechanism of updating boot sequence during a new joiner recovery, but there are two contradicting requirements:
a) every new joiner has to have bigger boot sequence than the current primary node - to ensure failover to this new joiner from primary will not lower "boot_sequence<<47 + exchange.seqeuenceNo" number. So any two new joiners B and C should have boot sequence bigger than primary A.
b) But also B should have boot sequence bigger than C (if it will become primary after A) while C should have it bigger than B for the same reason.

The only way is maintaining boot sequence cluster-wide on the same value, and during promoting _any_ broker, the number is incremented by one - cluster wide. (and of course, during recovery, it is updated to new joiner).

Alan, is the above paragraph (easily) feasible? (if so I can implement it)

- Pavel

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39038
-----------------------------------------------------------

On March 28, 2014, 1:11 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 1:11 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review39038
-----------------------------------------------------------



/trunk/qpid/cpp/src/qpid/broker/Exchange.cpp
<https://reviews.apache.org/r/19566/#comment71429>

    The reason the value was originally set in args was so that it would be replicated in old cluster.
    
    A question would be how the new ha handles sequencing. In the ideal world the boot sequence of a new master would always be one higher than any previous master. However that is really orthogonal to this issue.
    
    Maybe add a comment around the likelhood of wraparound be considered remote enough that it is not handled?
    
    Looks good to me though.


- Gordon Sim


On March 28, 2014, 1:11 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 28, 2014, 1:11 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/
-----------------------------------------------------------

(Updated March 28, 2014, 1:11 p.m.)


Review request for qpid, Gordon Sim and Kim van der Riet.


Changes
-------

Simple patch based on

qpid.msg_sequence = ((int64_t) getBootSequence()) << 47 + exchange.msg_sequence;

turned out that qpid.sequence_counter exchange argument (auxiliary, managed fully by the Exchange) is not further relevant, so I removed it from source code.

When testing the patch, I realized it resolves the original problem in message duplicity halfly only - on the consumer side. It does not affect producer sending redelivered messages after "broker acquires msg, and dies" scenario. Such messages have redelivered flag set and (much) higher sequence number (higher by 1 << 47, approx.). But that is expected.

Asking for patch review if qpid.sequence_counter can't be used elsewhere (or if some use case can't rely on it).


Bugs: QPID-5642
    https://issues.apache.org/jira/browse/QPID-5642


Repository: qpid


Description
-------

Elegant (but not performance optimal) way of patch:

1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.

Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.

However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.

I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.

Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?

Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?


Diffs (updated)
-----

  /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582719 

Diff: https://reviews.apache.org/r/19566/diff/


Testing
-------

- reproducer from JIRA verified
- automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)


Thanks,

Pavel Moravec

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Jakub Scholz <ww...@scholzj.com>.


> On March 27, 2014, 2:28 p.m., Chug Rolke wrote:
> > The boot sequence number is 16 bits (uint16_t) and the message sequence number is 63 bits (int64_t, where the top bit is unused as counts are never negative (hopefully)).
> > 
> > Another alternative is to merge the two fields by overlaying the boot sequence number into the upper bits of the message sequence. Off the cuff:
> > 
> >    qpid.msg_sequence = ((int64_t) getBootSequence()) << 47 + exchange.msg_sequence;
> > 
> > Each broker restart would then have 47 bits (1.41e+14) of counting to do before an exchange got into trouble with "wrapping" into the next boot sequence start point.
> > Each broker would have 16 bits (65,535) of counting restarts before the BootSequence wrapped back to zero.
> > 
> > Then the generated number fits into the current int64_t field and client code doesn't change at all.
> > 
> > I think that these numbers are generous enough then I would go this way. Otherwise I'd vote for scheme 1.
> > 
> > Maybe a customer how needs this feature could help choose which scheme would be best.
> 
> Gordon Sim wrote:
>     As an example (I hope my math is correct), 1.41e+14 would allow a rate of 100,000 messages through the exchange per second sustained for 5 years before wrapping around (or half a million for a year).
> 
> Gordon Sim wrote:
>     (clarification: half a million messages a second sustained for a year, that is)

Sounds good. 47 bits should be enough :-). And the approach with having only one sequence number property seems to me to be a bit cleaner compared to the approach with two properties.


- Jakub


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38730
-----------------------------------------------------------


On March 27, 2014, 8:58 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 8:58 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.


> On March 27, 2014, 2:28 p.m., Chug Rolke wrote:
> > The boot sequence number is 16 bits (uint16_t) and the message sequence number is 63 bits (int64_t, where the top bit is unused as counts are never negative (hopefully)).
> > 
> > Another alternative is to merge the two fields by overlaying the boot sequence number into the upper bits of the message sequence. Off the cuff:
> > 
> >    qpid.msg_sequence = ((int64_t) getBootSequence()) << 47 + exchange.msg_sequence;
> > 
> > Each broker restart would then have 47 bits (1.41e+14) of counting to do before an exchange got into trouble with "wrapping" into the next boot sequence start point.
> > Each broker would have 16 bits (65,535) of counting restarts before the BootSequence wrapped back to zero.
> > 
> > Then the generated number fits into the current int64_t field and client code doesn't change at all.
> > 
> > I think that these numbers are generous enough then I would go this way. Otherwise I'd vote for scheme 1.
> > 
> > Maybe a customer how needs this feature could help choose which scheme would be best.
> 
> Gordon Sim wrote:
>     As an example (I hope my math is correct), 1.41e+14 would allow a rate of 100,000 messages through the exchange per second sustained for 5 years before wrapping around (or half a million for a year).

(clarification: half a million messages a second sustained for a year, that is)


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38730
-----------------------------------------------------------


On March 27, 2014, 8:58 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 8:58 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.


> On March 27, 2014, 2:28 p.m., Chug Rolke wrote:
> > The boot sequence number is 16 bits (uint16_t) and the message sequence number is 63 bits (int64_t, where the top bit is unused as counts are never negative (hopefully)).
> > 
> > Another alternative is to merge the two fields by overlaying the boot sequence number into the upper bits of the message sequence. Off the cuff:
> > 
> >    qpid.msg_sequence = ((int64_t) getBootSequence()) << 47 + exchange.msg_sequence;
> > 
> > Each broker restart would then have 47 bits (1.41e+14) of counting to do before an exchange got into trouble with "wrapping" into the next boot sequence start point.
> > Each broker would have 16 bits (65,535) of counting restarts before the BootSequence wrapped back to zero.
> > 
> > Then the generated number fits into the current int64_t field and client code doesn't change at all.
> > 
> > I think that these numbers are generous enough then I would go this way. Otherwise I'd vote for scheme 1.
> > 
> > Maybe a customer how needs this feature could help choose which scheme would be best.

As an example (I hope my math is correct), 1.41e+14 would allow a rate of 100,000 messages through the exchange per second sustained for 5 years before wrapping around (or half a million for a year).


- Gordon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38730
-----------------------------------------------------------


On March 27, 2014, 8:58 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 8:58 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Chug Rolke <cr...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38730
-----------------------------------------------------------


The boot sequence number is 16 bits (uint16_t) and the message sequence number is 63 bits (int64_t, where the top bit is unused as counts are never negative (hopefully)).

Another alternative is to merge the two fields by overlaying the boot sequence number into the upper bits of the message sequence. Off the cuff:

   qpid.msg_sequence = ((int64_t) getBootSequence()) << 47 + exchange.msg_sequence;

Each broker restart would then have 47 bits (1.41e+14) of counting to do before an exchange got into trouble with "wrapping" into the next boot sequence start point.
Each broker would have 16 bits (65,535) of counting restarts before the BootSequence wrapped back to zero.

Then the generated number fits into the current int64_t field and client code doesn't change at all.

I think that these numbers are generous enough then I would go this way. Otherwise I'd vote for scheme 1.

Maybe a customer how needs this feature could help choose which scheme would be best.

- Chug Rolke


On March 27, 2014, 8:58 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 8:58 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38720
-----------------------------------------------------------


The second option isn't backwards compatible, since the property can no longer be converted to an int and so existing applications would get an exception. I would probably lean to the first option as well. Existing behaviour is unaffected, but for applications that want to handle broker restarts, they could add an additional check/comparison on the bootsequence. I do quite like the idea of combining the two numbers in some way if it could make the comparison/checking simpler, but it becomes awkward if the sequence can wraparound.

- Gordon Sim


On March 27, 2014, 8:58 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 27, 2014, 8:58 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Pavel Moravec <pm...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/
-----------------------------------------------------------

(Updated March 27, 2014, 8:58 a.m.)


Review request for qpid, Gordon Sim and Kim van der Riet.


Changes
-------

Trivial fix that adds - in parallel to qpid.msg_sequence - also qpid.boot_sequence message annotation. End user then should lexicographically compare the pair of values (qpid.boot_sequence, qpid.msg_sequence).

Then message headers look like:

Properties: {qpid.boot_sequence:16, qpid.msg_sequence:2, sn:1, ts:1395909295605937918, x-amqp-0-10.routing-key:}
Properties: {qpid.boot_sequence:16, qpid.msg_sequence:3, sn:2, ts:1395909295606126626, x-amqp-0-10.routing-key:}
Properties: {qpid.boot_sequence:16, qpid.msg_sequence:4, sn:3, ts:1395909295606179836, x-amqp-0-10.routing-key:}
(broker restart)
Properties: {qpid.boot_sequence:17, qpid.msg_sequence:1, sn:4, ts:1395909314496162977, x-amqp-0-10.routing-key:}
Properties: {qpid.boot_sequence:17, qpid.msg_sequence:2, sn:5, ts:1395909314496364823, x-amqp-0-10.routing-key:}
Properties: {qpid.boot_sequence:17, qpid.msg_sequence:3, sn:6, ts:1395909314496437384, x-amqp-0-10.routing-key:}


An optional fix that sets just qpid.msg_sequence as string "boot_sequence:exchange_sequence". End user then has to parse the string but gets the sequence stuff in just one annotation :

Index: cpp/src/qpid/broker/Exchange.cpp
===================================================================
--- cpp/src/qpid/broker/Exchange.cpp	(revision 1582207)
+++ cpp/src/qpid/broker/Exchange.cpp	(working copy)
@@ -63,7 +63,9 @@
 
         if (parent->sequence){
             parent->sequenceNo++;
-            msg.getMessage().addAnnotation(qpidMsgSequence,parent->sequenceNo);
+            char s[129];
+            sprintf(s, "%d:%ld", parent->getBroker()->getManagementAgent()->getBootSequence(), parent->sequenceNo);
+            msg.getMessage().addAnnotation(qpidMsgSequence,s);
         }
         if (parent->ive) {
             parent->lastMsg = msg.getMessage();


Then message headers look like:
Properties: {qpid.msg_sequence:18:1, sn:1, ts:1395910232986094505, x-amqp-0-10.routing-key:}
Properties: {qpid.msg_sequence:18:2, sn:2, ts:1395910232986239616, x-amqp-0-10.routing-key:}
Properties: {qpid.msg_sequence:18:3, sn:3, ts:1395910232986274364, x-amqp-0-10.routing-key:}
(broker restart)
Properties: {qpid.msg_sequence:19:1, sn:4, ts:1395910238014147248, x-amqp-0-10.routing-key:}
Properties: {qpid.msg_sequence:19:2, sn:5, ts:1395910238014284816, x-amqp-0-10.routing-key:}
Properties: {qpid.msg_sequence:19:3, sn:6, ts:1395910238014317980, x-amqp-0-10.routing-key:}


I am in favour of the first option, let me know your preference.


Bugs: QPID-5642
    https://issues.apache.org/jira/browse/QPID-5642


Repository: qpid


Description
-------

Elegant (but not performance optimal) way of patch:

1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.

Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.

However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.

I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.

Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?

Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?


Diffs (updated)
-----

  /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1582207 

Diff: https://reviews.apache.org/r/19566/diff/


Testing
-------

- reproducer from JIRA verified
- automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)


Thanks,

Pavel Moravec

Re: Review Request 19566: QPID-5642: Message sequence numbers (enabled with qpid.msg_sequence) should be persistent across broker restarts

Posted by Gordon Sim <gs...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19566/#review38533
-----------------------------------------------------------


Ultimately the question as to the acceptability of the performance impact is something only users can answer. However the penalty would be high as the update is synchronous and all bdb transactions will be serialised.

For a per-message operation such as this, a more performant solution would be to use the mechanism used for recording enqueues. Perhaps for example a special queue could be used onto which messages representing the changes to the sequence number could be enqueued (and the message representing the old value then dequeued, like with an LVQ). On recovery the last sequence would be recovered from the special queue and used to update the exchange.

This is somewhat half-baked, but might be an interesting angle to explore some more as a simple(ish) solution.

- Gordon Sim


On March 23, 2014, 11:52 a.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19566/
> -----------------------------------------------------------
> 
> (Updated March 23, 2014, 11:52 a.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5642
>     https://issues.apache.org/jira/browse/QPID-5642
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Elegant (but not performance optimal) way of patch:
> 
> 1) In Exchange::PreRoute::PreRoute, update exchange in store (whole entry).
> 2) The update method is dealed very similarly like MessageStoreImpl::create(db_ptr db, IdSequence& seq, const qpid::broker::Persistable& p) method, i.e. calls BufferValue that calls Exchange::encode.
> 
> Here the code can be unified by merging MessageStoreImpl::update intoMessageStoreImpl::create method where the code almost duplicates.
> 
> However I do not see the patch as performance efficient, as with every message preroute, new qpid::framing::Buffer is filled in Exchange::encode method, data are copied from it to char* BufferValue::data and even then they are really written to the BDB. While in fact we just update the only one number in the Buffer.
> 
> I tried to come up with less performance invasive approach (for those interested, see https://bugzilla.redhat.com/attachment.cgi?id=877576&action=diff - if you dont have access there, let me write), that keeps qpid::framing::Buffer for every durable exchange with sequencing enabled, but it returned (among others) into the need of changing the way store encodes/decodes Exchange instance (change in Exchange::encode / decode methods). What would make the broker backward incompatible.
> 
> Is the performance penalty (due to Exchange::encode method called for every message preroute) acceptable?
> 
> Is it worth merging MessageStoreImpl::update intoMessageStoreImpl::create method?
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Exchange.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/MessageStoreModule.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.h 1577475 
>   /trunk/qpid/cpp/src/qpid/broker/NullMessageStore.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1577475 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.h 1577475 
>   /trunk/qpid/cpp/src/qpid/store/MessageStorePlugin.cpp 1577475 
> 
> Diff: https://reviews.apache.org/r/19566/diff/
> 
> 
> Testing
> -------
> 
> - reproducer from JIRA verified
> - automated tests passed (except for those known to fail due to QPID-5641 (valgrind & legacystore)
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>