You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Chetan Mehrotra <ch...@gmail.com> on 2015/03/23 05:04:51 UTC

Efficiently process observation event for local changes

Hi Team,

Currently in some of the cases we are seeing issues where observation event
for local events are getting dropped due to observation queue getting full
specially on MongoMK based deployments. This happens mainly due to slowness
in generating diff on Mongo.

Some of the observation listeners are only interested in local events. For
e.g. listener registered by Workflow component on an AEM instance is
listening for local events only to trigger workflows. So for such type of
listeners we have following requirements

A - Design Constraints
------------------------------

1. Workflow is relying ONLY on local changes

2. Workflow misses on processing assets if the observation queue becomes
   full and this happens because observation logic stats merging the events

3. At least for DocumentMK for every local change done we already have
   calculated the diff which we currently push to DiffCache upon commit.

4. Observation logic eventually knows that given listener (and thus the
   wrapping observer) is only interested in local events

So in essence we already have the diff but we pass it to the observer via
the diff cache (at least for DocumentMK). Would it make sense to change the
current approach slightly

B - Proposed Changes
-------------------------------

1. Move the notion of listening to local events to Observer level - So upon
any new change detected we only push the change to a given queue if its
local and bounded listener is only interested in local. Currently we push
all changes which later do get filter out but we avoid doing that first
level itself and keep queue content limited to local changes only

2. Attach the calculated diff as part of commit info which is attached to
the given change. This would allow eliminating the chances of the cache
miss altogether and would ensure observation is not delayed due to slow
processing of diff. This can be done on best effort basis if the diff is to
large then we do not attach it and in that case we diff again

3. For listener which are only interested in local events we can use a
different queue size limit i.e. allow larger queues for such listener.

Later we can also look into using a journal (or persistent queue) for local
event processing.

Chetan Mehrotra

PS 1 - Local events vs Global events
-------------------------------------------------

Its important to distinguish the importance of local vs global events. With
global events we can miss on listening to find level changes i.e. its ok to
merge the events. However for local events its not possible to miss such
events as main a times listeners are only interested in changes happening
to same cluster nodes. If such events are lost then no other cluster node
would process such change. In AEM for instance this would result in asset
not getting processed.

Further we local changes the cluster node already has the diff (for
DocumentMK) and hence generating events for such changes would not be
costly. While for global changes generating diff would be costly. So it
makes sense to make a clear separation between these 2 kind of observers
and ensure that observers listening for local changes do not suffer due to
slowness in generating diff for global changes

Re: Efficiently process observation event for local changes

Posted by Chetan Mehrotra <ch...@gmail.com>.

Just to clarify - By mistake I used 'dropped' above. There is no way for a
normal running system to drop events. What I meant was when queue starts
getting full then BackGround observer would collapse individual changes to
bigger one. In doing that the events which were considered local would now
be treated as external. Due to which listeners which are only interested in
local event would miss out those changes.

So again Oak does not loose/drop events. What is does do is converting a
local event and treat it as an external event

Chetan Mehrotra

On Mon, Mar 23, 2015 at 9:34 AM, Chetan Mehrotra <ch...@gmail.com>
wrote:

> Hi Team,
>
> Currently in some of the cases we are seeing issues where observation
> event for local events are getting dropped due to observation queue getting
> full specially on MongoMK based deployments. This happens mainly due to
> slowness in generating diff on Mongo.
>
> Some of the observation listeners are only interested in local events. For
> e.g. listener registered by Workflow component on an AEM instance is
> listening for local events only to trigger workflows. So for such type of
> listeners we have following requirements
>
> A - Design Constraints
> ------------------------------
>
> 1. Workflow is relying ONLY on local changes
>
> 2. Workflow misses on processing assets if the observation queue becomes
>    full and this happens because observation logic stats merging the events
>
> 3. At least for DocumentMK for every local change done we already have
>    calculated the diff which we currently push to DiffCache upon commit.
>
> 4. Observation logic eventually knows that given listener (and thus the
>    wrapping observer) is only interested in local events
>
> So in essence we already have the diff but we pass it to the observer via
> the diff cache (at least for DocumentMK). Would it make sense to change the
> current approach slightly
>
> B - Proposed Changes
> -------------------------------
>
> 1. Move the notion of listening to local events to Observer level - So
> upon any new change detected we only push the change to a given queue if
> its local and bounded listener is only interested in local. Currently we
> push all changes which later do get filter out but we avoid doing that
> first level itself and keep queue content limited to local changes only
>
> 2. Attach the calculated diff as part of commit info which is attached to
> the given change. This would allow eliminating the chances of the cache
> miss altogether and would ensure observation is not delayed due to slow
> processing of diff. This can be done on best effort basis if the diff is to
> large then we do not attach it and in that case we diff again
>
> 3. For listener which are only interested in local events we can use a
> different queue size limit i.e. allow larger queues for such listener.
>
> Later we can also look into using a journal (or persistent queue) for
> local event processing.
>
> Chetan Mehrotra
>
> PS 1 - Local events vs Global events
> -------------------------------------------------
>
> Its important to distinguish the importance of local vs global events.
> With global events we can miss on listening to find level changes i.e. its
> ok to merge the events. However for local events its not possible to miss
> such events as main a times listeners are only interested in changes
> happening to same cluster nodes. If such events are lost then no other
> cluster node would process such change. In AEM for instance this would
> result in asset not getting processed.
>
> Further we local changes the cluster node already has the diff (for
> DocumentMK) and hence generating events for such changes would not be
> costly. While for global changes generating diff would be costly. So it
> makes sense to make a clear separation between these 2 kind of observers
> and ensure that observers listening for local changes do not suffer due to
> slowness in generating diff for global changes
>
>

Re: Efficiently process observation event for local changes

Posted by Stefan Egli <st...@apache.org>.

Related to this, I've created

https://issues.apache.org/jira/browse/OAK-2683

which is about an issue that happens when the observation queue limit is
reached.

Cheers,
Stefan

On 3/23/15 4:03 PM, "Chetan Mehrotra" <ch...@gmail.com> wrote:

>After discussing this further with Marcel and Michael we came to
>conclusion
>that we can achieve similar performance by make use of persistent cache
>for
>storing the diff. This would require slight change in way we interpret the
>diff JSOP. This should not require any change in current logic related to
>observation event generation. Opened OAK-2669 to track that.
>
>One thing that we might still want to do is to use separate queue size for
>listeners interested in local events only and those which can work with
>external event. On a system like AEM there 180 listeners which listen for
>external changes and ~20 which only listen to local changes. So makes
>sense
>to have bigger queues for such listners
>
>Chetan Mehrotra
>
>On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig <md...@apache.org> wrote:
>
>>
>>
>> On 23.3.15 11:03 , Stefan Egli wrote:
>>
>>> Going one step further we could also discuss to completely moving the
>>> handling of the 'observation queues' to an actual messaging system.
>>> Whether this would be embedded to an oak instance or whether it would
>>>be
>>> shared between instances in an oak cluster might be a different
>>>question
>>> (the embedded variant would have less implication on the overall oak
>>> model, esp also timing-wise). But the observation model quite exactly
>>> matches the publish-subscribe semantics - it actually matches pub-sub
>>>more
>>> than it fits into the 'cache semantics' to me.
>>>
>>
>> Definitely something to try out, given someone find the time for it. ;-)
>> Mind you that some time ago I implemented persisting events to Apache
>>Kafka
>> [1], which wasn't greeted with great enthusiasm though...
>>
>> OTOH the same concern regarding pushing the bottleneck to IO applies
>>here.
>> Furthermore filtering the persisted events through access control is
>> something we need yet to figure out as AC is a) sessions scoped and b)
>> depends on the tree hierarchy.
>>
>> Michael
>>
>>
>> [1] https://github.com/mduerig/oak-kafka
>>
>>
>>
>>> .. just saying ..
>>>
>>> On 3/23/15 10:47 AM, "Michael Dürig" <md...@apache.org> wrote:
>>>
>>>
>>>>
>>>> On 23.3.15 5:04 , Chetan Mehrotra wrote:
>>>>
>>>>> B - Proposed Changes
>>>>> -------------------------------
>>>>>
>>>>> 1. Move the notion of listening to local events to Observer level -
>>>>>So
>>>>> upon
>>>>> any new change detected we only push the change to a given queue if
>>>>>its
>>>>> local and bounded listener is only interested in local. Currently we
>>>>> push
>>>>> all changes which later do get filter out but we avoid doing that
>>>>>first
>>>>> level itself and keep queue content limited to local changes only
>>>>>
>>>>
>>>> I think there is no change needed in the Observer API itself as you
>>>>can
>>>> already figure out from the passed CommitInfo whether a commit is
>>>> external or not. BTW please take care with the term "local" as there
>>>>is
>>>> also the concept of "session local" commits.
>>>>
>>>>
>>>>> 2. Attach the calculated diff as part of commit info which is
>>>>>attached
>>>>> to
>>>>> the given change. This would allow eliminating the chances of the
>>>>>cache
>>>>> miss altogether and would ensure observation is not delayed due to
>>>>>slow
>>>>> processing of diff. This can be done on best effort basis if the diff
>>>>> is to
>>>>> large then we do not attach it and in that case we diff again
>>>>>
>>>>> 3. For listener which are only interested in local events we can use
>>>>>a
>>>>> different queue size limit i.e. allow larger queues for such
>>>>>listener.
>>>>>
>>>>> Later we can also look into using a journal (or persistent queue) for
>>>>> local
>>>>> event processing.
>>>>>
>>>>
>>>> Definitely something to try out. A few points to consider:
>>>>
>>>> * There doesn't seem to be too much of a difference to me whether this
>>>> is routed via a cache or directly attached to commits. In wither way
>>>>it
>>>> adds additional memory requirements and churn, which need to be
>>>>managed.
>>>>
>>>> * When introducing persisted queuing we need to be careful not to just
>>>> move the bottleneck to IO.
>>>>
>>>> * An eventual implementation should not break the fundamental design.
>>>> Either hide it in the implementation or find a clean way to put this
>>>> into the overall design.
>>>>
>>>> Michael
>>>>
>>>
>>>
>>>

Re: Efficiently process observation event for local changes

Posted by Michael Marth <mm...@adobe.com>.

fwiw: I think separating queues for listeners interested in local events from a queue for listeners interested in global events is a a very promising approach.

Cheers
Michael

> On 23 Mar 2015, at 16:03, Chetan Mehrotra <ch...@gmail.com> wrote:
> 
> After discussing this further with Marcel and Michael we came to conclusion
> that we can achieve similar performance by make use of persistent cache for
> storing the diff. This would require slight change in way we interpret the
> diff JSOP. This should not require any change in current logic related to
> observation event generation. Opened OAK-2669 to track that.
> 
> One thing that we might still want to do is to use separate queue size for
> listeners interested in local events only and those which can work with
> external event. On a system like AEM there 180 listeners which listen for
> external changes and ~20 which only listen to local changes. So makes sense
> to have bigger queues for such listners
> 
> Chetan Mehrotra
> 
> On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig <md...@apache.org> wrote:
> 
>> 
>> 
>> On 23.3.15 11:03 , Stefan Egli wrote:
>> 
>>> Going one step further we could also discuss to completely moving the
>>> handling of the 'observation queues' to an actual messaging system.
>>> Whether this would be embedded to an oak instance or whether it would be
>>> shared between instances in an oak cluster might be a different question
>>> (the embedded variant would have less implication on the overall oak
>>> model, esp also timing-wise). But the observation model quite exactly
>>> matches the publish-subscribe semantics - it actually matches pub-sub more
>>> than it fits into the 'cache semantics' to me.
>>> 
>> 
>> Definitely something to try out, given someone find the time for it. ;-)
>> Mind you that some time ago I implemented persisting events to Apache Kafka
>> [1], which wasn't greeted with great enthusiasm though...
>> 
>> OTOH the same concern regarding pushing the bottleneck to IO applies here.
>> Furthermore filtering the persisted events through access control is
>> something we need yet to figure out as AC is a) sessions scoped and b)
>> depends on the tree hierarchy.
>> 
>> Michael
>> 
>> 
>> [1] https://github.com/mduerig/oak-kafka
>> 
>> 
>> 
>>> .. just saying ..
>>> 
>>> On 3/23/15 10:47 AM, "Michael Dürig" <md...@apache.org> wrote:
>>> 
>>> 
>>>> 
>>>> On 23.3.15 5:04 , Chetan Mehrotra wrote:
>>>> 
>>>>> B - Proposed Changes
>>>>> -------------------------------
>>>>> 
>>>>> 1. Move the notion of listening to local events to Observer level - So
>>>>> upon
>>>>> any new change detected we only push the change to a given queue if its
>>>>> local and bounded listener is only interested in local. Currently we
>>>>> push
>>>>> all changes which later do get filter out but we avoid doing that first
>>>>> level itself and keep queue content limited to local changes only
>>>>> 
>>>> 
>>>> I think there is no change needed in the Observer API itself as you can
>>>> already figure out from the passed CommitInfo whether a commit is
>>>> external or not. BTW please take care with the term "local" as there is
>>>> also the concept of "session local" commits.
>>>> 
>>>> 
>>>>> 2. Attach the calculated diff as part of commit info which is attached
>>>>> to
>>>>> the given change. This would allow eliminating the chances of the cache
>>>>> miss altogether and would ensure observation is not delayed due to slow
>>>>> processing of diff. This can be done on best effort basis if the diff
>>>>> is to
>>>>> large then we do not attach it and in that case we diff again
>>>>> 
>>>>> 3. For listener which are only interested in local events we can use a
>>>>> different queue size limit i.e. allow larger queues for such listener.
>>>>> 
>>>>> Later we can also look into using a journal (or persistent queue) for
>>>>> local
>>>>> event processing.
>>>>> 
>>>> 
>>>> Definitely something to try out. A few points to consider:
>>>> 
>>>> * There doesn't seem to be too much of a difference to me whether this
>>>> is routed via a cache or directly attached to commits. In wither way it
>>>> adds additional memory requirements and churn, which need to be managed.
>>>> 
>>>> * When introducing persisted queuing we need to be careful not to just
>>>> move the bottleneck to IO.
>>>> 
>>>> * An eventual implementation should not break the fundamental design.
>>>> Either hide it in the implementation or find a clean way to put this
>>>> into the overall design.
>>>> 
>>>> Michael
>>>> 
>>> 
>>> 
>>>

Re: Efficiently process observation event for local changes

Posted by Chetan Mehrotra <ch...@gmail.com>.

After discussing this further with Marcel and Michael we came to conclusion
that we can achieve similar performance by make use of persistent cache for
storing the diff. This would require slight change in way we interpret the
diff JSOP. This should not require any change in current logic related to
observation event generation. Opened OAK-2669 to track that.

One thing that we might still want to do is to use separate queue size for
listeners interested in local events only and those which can work with
external event. On a system like AEM there 180 listeners which listen for
external changes and ~20 which only listen to local changes. So makes sense
to have bigger queues for such listners

Chetan Mehrotra

On Mon, Mar 23, 2015 at 4:09 PM, Michael Dürig <md...@apache.org> wrote:

>
>
> On 23.3.15 11:03 , Stefan Egli wrote:
>
>> Going one step further we could also discuss to completely moving the
>> handling of the 'observation queues' to an actual messaging system.
>> Whether this would be embedded to an oak instance or whether it would be
>> shared between instances in an oak cluster might be a different question
>> (the embedded variant would have less implication on the overall oak
>> model, esp also timing-wise). But the observation model quite exactly
>> matches the publish-subscribe semantics - it actually matches pub-sub more
>> than it fits into the 'cache semantics' to me.
>>
>
> Definitely something to try out, given someone find the time for it. ;-)
> Mind you that some time ago I implemented persisting events to Apache Kafka
> [1], which wasn't greeted with great enthusiasm though...
>
> OTOH the same concern regarding pushing the bottleneck to IO applies here.
> Furthermore filtering the persisted events through access control is
> something we need yet to figure out as AC is a) sessions scoped and b)
> depends on the tree hierarchy.
>
> Michael
>
>
> [1] https://github.com/mduerig/oak-kafka
>
>
>
>> .. just saying ..
>>
>> On 3/23/15 10:47 AM, "Michael Dürig" <md...@apache.org> wrote:
>>
>>
>>>
>>> On 23.3.15 5:04 , Chetan Mehrotra wrote:
>>>
>>>> B - Proposed Changes
>>>> -------------------------------
>>>>
>>>> 1. Move the notion of listening to local events to Observer level - So
>>>> upon
>>>> any new change detected we only push the change to a given queue if its
>>>> local and bounded listener is only interested in local. Currently we
>>>> push
>>>> all changes which later do get filter out but we avoid doing that first
>>>> level itself and keep queue content limited to local changes only
>>>>
>>>
>>> I think there is no change needed in the Observer API itself as you can
>>> already figure out from the passed CommitInfo whether a commit is
>>> external or not. BTW please take care with the term "local" as there is
>>> also the concept of "session local" commits.
>>>
>>>
>>>> 2. Attach the calculated diff as part of commit info which is attached
>>>> to
>>>> the given change. This would allow eliminating the chances of the cache
>>>> miss altogether and would ensure observation is not delayed due to slow
>>>> processing of diff. This can be done on best effort basis if the diff
>>>> is to
>>>> large then we do not attach it and in that case we diff again
>>>>
>>>> 3. For listener which are only interested in local events we can use a
>>>> different queue size limit i.e. allow larger queues for such listener.
>>>>
>>>> Later we can also look into using a journal (or persistent queue) for
>>>> local
>>>> event processing.
>>>>
>>>
>>> Definitely something to try out. A few points to consider:
>>>
>>> * There doesn't seem to be too much of a difference to me whether this
>>> is routed via a cache or directly attached to commits. In wither way it
>>> adds additional memory requirements and churn, which need to be managed.
>>>
>>> * When introducing persisted queuing we need to be careful not to just
>>> move the bottleneck to IO.
>>>
>>> * An eventual implementation should not break the fundamental design.
>>> Either hide it in the implementation or find a clean way to put this
>>> into the overall design.
>>>
>>> Michael
>>>
>>
>>
>>

Re: Efficiently process observation event for local changes

Posted by Michael Dürig <md...@apache.org>.


On 23.3.15 11:03 , Stefan Egli wrote:
> Going one step further we could also discuss to completely moving the
> handling of the 'observation queues' to an actual messaging system.
> Whether this would be embedded to an oak instance or whether it would be
> shared between instances in an oak cluster might be a different question
> (the embedded variant would have less implication on the overall oak
> model, esp also timing-wise). But the observation model quite exactly
> matches the publish-subscribe semantics - it actually matches pub-sub more
> than it fits into the 'cache semantics' to me.

Definitely something to try out, given someone find the time for it. ;-) 
Mind you that some time ago I implemented persisting events to Apache 
Kafka [1], which wasn't greeted with great enthusiasm though...

OTOH the same concern regarding pushing the bottleneck to IO applies 
here. Furthermore filtering the persisted events through access control 
is something we need yet to figure out as AC is a) sessions scoped and 
b) depends on the tree hierarchy.

Michael


[1] https://github.com/mduerig/oak-kafka

>
> .. just saying ..
>
> On 3/23/15 10:47 AM, "Michael Dürig" <md...@apache.org> wrote:
>
>>
>>
>> On 23.3.15 5:04 , Chetan Mehrotra wrote:
>>> B - Proposed Changes
>>> -------------------------------
>>>
>>> 1. Move the notion of listening to local events to Observer level - So
>>> upon
>>> any new change detected we only push the change to a given queue if its
>>> local and bounded listener is only interested in local. Currently we
>>> push
>>> all changes which later do get filter out but we avoid doing that first
>>> level itself and keep queue content limited to local changes only
>>
>> I think there is no change needed in the Observer API itself as you can
>> already figure out from the passed CommitInfo whether a commit is
>> external or not. BTW please take care with the term "local" as there is
>> also the concept of "session local" commits.
>>
>>>
>>> 2. Attach the calculated diff as part of commit info which is attached
>>> to
>>> the given change. This would allow eliminating the chances of the cache
>>> miss altogether and would ensure observation is not delayed due to slow
>>> processing of diff. This can be done on best effort basis if the diff
>>> is to
>>> large then we do not attach it and in that case we diff again
>>>
>>> 3. For listener which are only interested in local events we can use a
>>> different queue size limit i.e. allow larger queues for such listener.
>>>
>>> Later we can also look into using a journal (or persistent queue) for
>>> local
>>> event processing.
>>
>> Definitely something to try out. A few points to consider:
>>
>> * There doesn't seem to be too much of a difference to me whether this
>> is routed via a cache or directly attached to commits. In wither way it
>> adds additional memory requirements and churn, which need to be managed.
>>
>> * When introducing persisted queuing we need to be careful not to just
>> move the bottleneck to IO.
>>
>> * An eventual implementation should not break the fundamental design.
>> Either hide it in the implementation or find a clean way to put this
>> into the overall design.
>>
>> Michael
>
>

Re: Efficiently process observation event for local changes

Posted by Stefan Egli <st...@apache.org>.

Going one step further we could also discuss to completely moving the
handling of the 'observation queues' to an actual messaging system.
Whether this would be embedded to an oak instance or whether it would be
shared between instances in an oak cluster might be a different question
(the embedded variant would have less implication on the overall oak
model, esp also timing-wise). But the observation model quite exactly
matches the publish-subscribe semantics - it actually matches pub-sub more
than it fits into the 'cache semantics' to me.

.. just saying ..

On 3/23/15 10:47 AM, "Michael Dürig" <md...@apache.org> wrote:

>
>
>On 23.3.15 5:04 , Chetan Mehrotra wrote:
>> B - Proposed Changes
>> -------------------------------
>>
>> 1. Move the notion of listening to local events to Observer level - So
>>upon
>> any new change detected we only push the change to a given queue if its
>> local and bounded listener is only interested in local. Currently we
>>push
>> all changes which later do get filter out but we avoid doing that first
>> level itself and keep queue content limited to local changes only
>
>I think there is no change needed in the Observer API itself as you can
>already figure out from the passed CommitInfo whether a commit is
>external or not. BTW please take care with the term "local" as there is
>also the concept of "session local" commits.
>
>>
>> 2. Attach the calculated diff as part of commit info which is attached
>>to
>> the given change. This would allow eliminating the chances of the cache
>> miss altogether and would ensure observation is not delayed due to slow
>> processing of diff. This can be done on best effort basis if the diff
>>is to
>> large then we do not attach it and in that case we diff again
>>
>> 3. For listener which are only interested in local events we can use a
>> different queue size limit i.e. allow larger queues for such listener.
>>
>> Later we can also look into using a journal (or persistent queue) for
>>local
>> event processing.
>
>Definitely something to try out. A few points to consider:
>
>* There doesn't seem to be too much of a difference to me whether this
>is routed via a cache or directly attached to commits. In wither way it
>adds additional memory requirements and churn, which need to be managed.
>
>* When introducing persisted queuing we need to be careful not to just
>move the bottleneck to IO.
>
>* An eventual implementation should not break the fundamental design.
>Either hide it in the implementation or find a clean way to put this
>into the overall design.
>
>Michael

Re: Efficiently process observation event for local changes

Posted by Michael Dürig <md...@apache.org>.

On 23.3.15 5:04 , Chetan Mehrotra wrote:
> B - Proposed Changes
> -------------------------------
>
> 1. Move the notion of listening to local events to Observer level - So upon
> any new change detected we only push the change to a given queue if its
> local and bounded listener is only interested in local. Currently we push
> all changes which later do get filter out but we avoid doing that first
> level itself and keep queue content limited to local changes only

I think there is no change needed in the Observer API itself as you can 
already figure out from the passed CommitInfo whether a commit is 
external or not. BTW please take care with the term "local" as there is 
also the concept of "session local" commits.

>
> 2. Attach the calculated diff as part of commit info which is attached to
> the given change. This would allow eliminating the chances of the cache
> miss altogether and would ensure observation is not delayed due to slow
> processing of diff. This can be done on best effort basis if the diff is to
> large then we do not attach it and in that case we diff again
>
> 3. For listener which are only interested in local events we can use a
> different queue size limit i.e. allow larger queues for such listener.
>
> Later we can also look into using a journal (or persistent queue) for local
> event processing.

Definitely something to try out. A few points to consider:

* There doesn't seem to be too much of a difference to me whether this 
is routed via a cache or directly attached to commits. In wither way it 
adds additional memory requirements and churn, which need to be managed.

* When introducing persisted queuing we need to be careful not to just 
move the bottleneck to IO.

* An eventual implementation should not break the fundamental design. 
Either hide it in the implementation or find a clean way to put this 
into the overall design.

Michael