You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Pavel Moravec <pm...@redhat.com> on 2014/07/07 18:59:53 UTC

Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/
-----------------------------------------------------------

Review request for qpid, Gordon Sim and Kim van der Riet.

Summary (updated)
-----------------

[C++ broker] Make memory usage consistent after broker restart

Bugs: QPID-5880
https://issues.apache.org/jira/browse/QPID-5880

Repository: qpid

Description (updated)
-------

Simple idea:
- in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
- during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits

Known limitation:
- message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery

The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.

Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)

Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)

Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.

So it is reasonable for standalone brokers only.

Diffs (updated)
-----

/trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083
/trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083
/trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083
/trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083
/trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083
/trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083
/trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083

Diff: https://reviews.apache.org/r/23305/diff/

Testing (updated)
-------

No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).

Automated tests passed.

Thanks,

Pavel Moravec

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Pavel Moravec <pm...@redhat.com>.


> On July 7, 2014, 5:17 p.m., Gordon Sim wrote:
> > Having the broker set the persistence id in some cases, where the current expectation is that the store sets it, makes me nervous. How does this affect the windows store(s) for example? Does any store make any assumptions about the id that would no longer hold?

Good point.

Both linearstore and legacystore make no assumption. They just set (and require) unique ID for each record within one journal, but don't care about it otherwise. In legacystore, there is an int comparison within loadContent method (used only for journal recovery) that can make the function less efficient if unordered persistence ID is found.

Anyway it would be great if Kim could confirm I am right.

Both Window stores rewrite persistence ID by their value, so for ms-clfs and ms-sql stores the patch won't have any impact at all.


> On July 7, 2014, 5:17 p.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h, line 81
> > <https://reviews.apache.org/r/23305/diff/2/?file=624628#file624628line81>
> >
> >     Should be camel case rather than underscore, like other option members.

Fixed.


> On July 7, 2014, 5:17 p.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp, line 61
> > <https://reviews.apache.org/r/23305/diff/2/?file=624629#file624629line61>
> >
> >     Odd option name, qpid.store_msg_id would be neater. I don't feel the option name really properly describes the function though. It's really more something like 'share recovered messages'

I did not like the option name either, it was just first draft. Thanks for the suggestion, SHARE_RECOVERED_MSGS("qpid.share_recovered_msgs"); is used now.


> On July 7, 2014, 5:17 p.m., Gordon Sim wrote:
> > /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp, line 987
> > <https://reviews.apache.org/r/23305/diff/2/?file=624631#file624631line987>
> >
> >     Would be nicer to contain this all in the broker code, e.g. perhaps in qpid/broker/RecoveryManagerImpl.cpp?

That would make sense from logical point of view, as the broker should be responsible for the message re-coupling in its memory. But..

It would have to be in RecoveryManagerImpl class, to have the message_ids map broker-wide.

RecoveryManagerImpl::recoverMessage is aware of one message only, without context (like queue it belongs to or what persistency ID the message should have). I would have to extend the method argument to have persistencyID and RecoverableQueue::shared_ptr to move that functionality there. The Queue pointer is necessary due to message_ids map to keep only pointers to messages from qpid.share_recovered_msgs queues. Otherwise the map would cache redundant pointers as well.

Is it worth adding the two arguments for a method to be called for every message recovery? If the feature will be applied only to some queues?

Or is there some another option I overlooked?


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47397
-----------------------------------------------------------


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Gordon Sim <gs...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47397
-----------------------------------------------------------


Having the broker set the persistence id in some cases, where the current expectation is that the store sets it, makes me nervous. How does this affect the windows store(s) for example? Does any store make any assumptions about the id that would no longer hold?


/trunk/qpid/cpp/src/qpid/broker/QueueSettings.h
<https://reviews.apache.org/r/23305/#comment83126>

    Should be camel case rather than underscore, like other option members.



/trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp
<https://reviews.apache.org/r/23305/#comment83127>

    Odd option name, qpid.store_msg_id would be neater. I don't feel the option name really properly describes the function though. It's really more something like 'share recovered messages'



/trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp
<https://reviews.apache.org/r/23305/#comment83128>

    Would be nicer to contain this all in the broker code, e.g. perhaps in qpid/broker/RecoveryManagerImpl.cpp?


- Gordon Sim


On July 7, 2014, 4:59 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 7, 2014, 4:59 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Kim van der Riet <ki...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review48360
-----------------------------------------------------------

This is a fundamentally difficult problem to solve, especially since the legacy/linearstore was conceptualized as a per-queue store rather than as a centralized message store. There are trade-offs between these two approaches, and the ability to track the same message over multiple queues is one of these. A centralized store design would make this easy, but the overhead of tracking what each queue is doing with each message becomes difficlut. A per-queue store takes away the message-tracking issues, but leaves makes cross-queue correlation difficult. There is no consensus on which of these design approaches is best, and when what is now the legacystore was designed, per-queue throughput was considered an important criterion, and so the per-queue design was used.

The linear and legacy stores require locally unique ids which span recoveries. By locally, I mean per-queue unique. This means that the same persistence id can be used in different queues for different messages. By spanning recoveries, I mean that once a persistence id is "used" by consuming a message, it must also be purged from the file system (in the case of the legacystore, files with both enqueue and dequeue records must be overwritten and in the case of linearstore, returned to the EFP) so that it will not be encountered during the recovery process as it reads all the files to determine which messages are current. For legacystore, it is also helpful if the persistence ids are monotonically increasing, but this does not affect the accuracy of the recovery, only the speed of the recovery. There is nothing wrong with the broker setting the persistence ids to globally unique values so long as these requirements are met. As previous comments have already stated, we should make sure 
 that no other implementations of the store will be broken by this.

In your proposed solution, you introduce a std::map to contain the recovered persistence ids. My concern with this is twofold, both around recovering large numbers of messages:

1. Performance when recovering large numbers of messages: For each unique message, two methods of logarithmic complexity must be executed: std::map::find() followed by std::map::operator[]. For small numbers of messages, this may not be noticeable, but as the number of messages increase, these two methods will become rapidly slower. Remember also that the messages in the map will not be just current messages, but all messages read from the journal files as the store plays back the enqueueing and dequeueing of all visible messages.

2. Memory footprint: Once the recovery is complete, the map contains all the persistence ids of all messages seen during recovery. Although the size of the persistence ids stored is likely small next to the message content of the messages, the sheer number of them may reduce or even negate the memory savings advantage of this approach. Of course, this could be made a temporary issue by simply clearing the map once the recovery is complete as these no longer serve any purpose beyond the recovery itself.

3. Thread safety: Currently, IIRC, only one thread is used to recover the various queue stores serially. If that should change, and each queue is recovered independently on separate threads, then your map will require a lock.

If use-cases where the message size is large and the number of queues/messages is relatively small, then this approach could still provide a benefit. Perhaps some measurements of performance and memory footprint are needed to ascertain where these trade-offs make sense.

- Kim van der Riet

On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Pavel Moravec <pm...@redhat.com>.


> On July 8, 2014, 1:55 p.m., Alan Conway wrote:
> > Using a pointer as an identifier isn't safe. Pointer values can be re-used as soon as the in-memory copy of the message is deleted, so they are not unique over time. A global atomic counter might be a possibility but we would need to benchmark for performance consequences before making this the default behavior.

The pointer to SharedState should become invalid once all copies of the message are gone. I.e. once the messages disappear from store as well. Even then the pointer can be re-used for some another object, optionally another SharedState ptr. (until we use paged queues but there is a check the option can't be used together with paged queues).

Using global atomic counter: how to set the same counter value to two copies of the same message (with the same SharedState) being enqueued to different queues? Or, re-phrasing the question, how to identify two messages enqueued to two different queues are just two instances of the same message (with the same SharedState)? That is the crucial question here. The pointer to SharedState is easy solution, though I agree that in general pointers should not be used as some (unique) reference IDs.

Honestly, I see my solution somehow "fragile" but can't transform that feeling into any particular technical objection against the patch (and I thought a lot what all could it break).


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47442
-----------------------------------------------------------


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Pavel Moravec <pm...@redhat.com>.


> On July 8, 2014, 1:55 p.m., Alan Conway wrote:
> > Using a pointer as an identifier isn't safe. Pointer values can be re-used as soon as the in-memory copy of the message is deleted, so they are not unique over time. A global atomic counter might be a possibility but we would need to benchmark for performance consequences before making this the default behavior.
> 
> Pavel Moravec wrote:
>     The pointer to SharedState should become invalid once all copies of the message are gone. I.e. once the messages disappear from store as well. Even then the pointer can be re-used for some another object, optionally another SharedState ptr. (until we use paged queues but there is a check the option can't be used together with paged queues).
>     
>     Using global atomic counter: how to set the same counter value to two copies of the same message (with the same SharedState) being enqueued to different queues? Or, re-phrasing the question, how to identify two messages enqueued to two different queues are just two instances of the same message (with the same SharedState)? That is the crucial question here. The pointer to SharedState is easy solution, though I agree that in general pointers should not be used as some (unique) reference IDs.
>     
>     Honestly, I see my solution somehow "fragile" but can't transform that feeling into any particular technical objection against the patch (and I thought a lot what all could it break).
> 
> Alan Conway wrote:
>     You are storing shared-state pointers in the store as persistence-ids. If you shut down the broker with messages in the store and re-start it then you may get new messages with shared-state pointers that have the same value as the persistence-ids of messages already in the store.
>     
>     Also I share Gordons concern about setting persistence-id in the broker. That will work OK with our stores but there are other store implementations (windows) that might set peristence-id to some kind of database identifier - setting them in the broker would break such a store.
>     
>     A global counter could be set on the SharedState when the SharedState is created. You would need to add a new field to the persistence record to put it in the store which might be a compatibility issue (unless there's an unused 64bit slot lying around). Using an AtomicValue would minimise the performance impact but there could still be one, so I would make this something that is enabled only if there actually is a store. Again there's the question of how this would work with the windows stores or other possible store implementations, I'm not sure if there's a safe & compatible way to add such a field.

Good point with SharedState after restart - I thought journal recovery will resolve the problem but I was wrong.

The problem with global counter on SharedStore is that it affects all queues and all messages, so it would have negative impact to memory usage in most cases. Something I tried to avoid (also by using the per-queue option disabled by default).

I don't see an easy way how to move forward at the moment to fullfil all (already minimized) requirements - until e.g. adding message annotation does not create a new copy of the message. So I am putting on hold this (at least for a while).


- Pavel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47442
-----------------------------------------------------------


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Alan Conway <ac...@redhat.com>.


> On July 8, 2014, 1:55 p.m., Alan Conway wrote:
> > Using a pointer as an identifier isn't safe. Pointer values can be re-used as soon as the in-memory copy of the message is deleted, so they are not unique over time. A global atomic counter might be a possibility but we would need to benchmark for performance consequences before making this the default behavior.
> 
> Pavel Moravec wrote:
>     The pointer to SharedState should become invalid once all copies of the message are gone. I.e. once the messages disappear from store as well. Even then the pointer can be re-used for some another object, optionally another SharedState ptr. (until we use paged queues but there is a check the option can't be used together with paged queues).
>     
>     Using global atomic counter: how to set the same counter value to two copies of the same message (with the same SharedState) being enqueued to different queues? Or, re-phrasing the question, how to identify two messages enqueued to two different queues are just two instances of the same message (with the same SharedState)? That is the crucial question here. The pointer to SharedState is easy solution, though I agree that in general pointers should not be used as some (unique) reference IDs.
>     
>     Honestly, I see my solution somehow "fragile" but can't transform that feeling into any particular technical objection against the patch (and I thought a lot what all could it break).

You are storing shared-state pointers in the store as persistence-ids. If you shut down the broker with messages in the store and re-start it then you may get new messages with shared-state pointers that have the same value as the persistence-ids of messages already in the store.

Also I share Gordons concern about setting persistence-id in the broker. That will work OK with our stores but there are other store implementations (windows) that might set peristence-id to some kind of database identifier - setting them in the broker would break such a store.

A global counter could be set on the SharedState when the SharedState is created. You would need to add a new field to the persistence record to put it in the store which might be a compatibility issue (unless there's an unused 64bit slot lying around). Using an AtomicValue would minimise the performance impact but there could still be one, so I would make this something that is enabled only if there actually is a store. Again there's the question of how this would work with the windows stores or other possible store implementations, I'm not sure if there's a safe & compatible way to add such a field.


- Alan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47442
-----------------------------------------------------------


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Alan Conway <ac...@redhat.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review47442
-----------------------------------------------------------


Using a pointer as an identifier isn't safe. Pointer values can be re-used as soon as the in-memory copy of the message is deleted, so they are not unique over time. A global atomic counter might be a possibility but we would need to benchmark for performance consequences before making this the default behavior.

- Alan Conway


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique number that is identical to all message instances that has common SharedState - e.g. to the pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness of the ID is assured: 1) a new / different message with the same persistencyID can appear only after the previous instance is gone from memory, and 2) only queues with the option enabled are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue 1 message to queue q[1-100]" events. So backup brokers consume more memory than primary - the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Pavel Moravec <pm...@redhat.com>.

(Updated July 8, 2014, 12:53 p.m.)

Review request for qpid, Gordon Sim and Kim van der Riet.

Changes
-------

Added check the option can't be used with paging. As messages offloaded to the disk and re-loaded back might have different pointer to SharedState, making the pointer (as value of persistencyID) not unique.

Bugs: QPID-5880
https://issues.apache.org/jira/browse/QPID-5880

Repository: qpid

Description
-------

Known limitation:
- message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery

The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.

Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)

So it is reasonable for standalone brokers only.

Diffs (updated)
-----

Diff: https://reviews.apache.org/r/23305/diff/

Testing
-------

No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).

Automated tests passed.

Thanks,

Pavel Moravec

Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart

Posted by Pavel Moravec <pm...@redhat.com>.

(Updated July 8, 2014, 11:56 a.m.)

Review request for qpid, Gordon Sim and Kim van der Riet.

Changes
-------

updated diff per Gordon's comments (all except the latest).

Bugs: QPID-5880
https://issues.apache.org/jira/browse/QPID-5880

Repository: qpid

Description
-------

Known limitation:
- message annotations added to some queue (e.g. due to queue sequencing enabled) will be either over-written or shared to all other queues during recovery

The patch contains a new QueueSettings option to enable (by default disabled) this feature on per queue basis. This somehow limits the limitation above.

Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security breach? (I dont think so, but worth to ask)

So it is reasonable for standalone brokers only.

Diffs (updated)
-----

/trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083
/trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083
/trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083
/trunk/qpid/cpp/src/qpid/broker/RecoveryManagerImpl.h 1608083
/trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083
/trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083
/trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083
/trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083

Diff: https://reviews.apache.org/r/23305/diff/

Testing
-------

No significant difference in memory consumption before & after restart (in setup of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange to all of them).

Automated tests passed.

Thanks,

Pavel Moravec