You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2015/11/26 10:59:10 UTC

[jira] [Commented] (OAK-2683) the "hitting the observation queue limit" problem

    [ https://issues.apache.org/jira/browse/OAK-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028455#comment-15028455 ] 

Stefan Egli commented on OAK-2683:
----------------------------------

IMO this problem cannot be solved without some sort of flow-control:
* ideal would be blocking the rate of commits - but could imagine that this is not acceptable
* when scaling up a cluster, the rate of external changes/commits is likely to increase and having end-to-end flow-control within a cluster sounds very awkward

So perhaps we should not focus on flow-controlling the commits, but flow-controlling the observation queue itself. That is: keep the observation queue always within a size which performs well (ie where there are no cache-misses that slow down everything). If this limit is hit, then there are two options:
* reduce the rate at which changes are enqueued. This would work fine with external changes, since those are available in a journal and could easily be delayed. But it would not be easy for internal changes, since they additionally carry the commit info - and if you throw away internal head/commitInfo-pairs and re-read them later from disk, you'd loose that commit info. But what about persisting the commit info too?
* additionally, we could analyse the listeners: perhaps there's only one or a few listeners that are slow - and treating them separately would allow the fast ones to run at full up-to-date speed. The slow ones could be put into a "read-from-disk" mode where for them they have to read the journal and do the diff outside of the cache (to avoid making the cache unnecessarily dirty with such old diffs).

This would have to be analyzed a bit, but I think doing flow-control on the observation queues directly, in combination with kicking out slow listeners to have them read-from-journal could be a scalable solution.

> the "hitting the observation queue limit" problem
> -------------------------------------------------
>
>                 Key: OAK-2683
>                 URL: https://issues.apache.org/jira/browse/OAK-2683
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core, mongomk, segmentmk
>            Reporter: Stefan Egli
>              Labels: observation, resilience
>
> There are several tickets in this area:
> * OAK-2587: threading with observation being too eagar causing observation queue to grow
> * OAK-2669: avoiding diffing from mongo by using persistent cache instead.
> * OAK-2349: which might be a duplicate or at least similar to 2669..
> * OAK-2562: diffcache is inefficient
> Yet I think it makes sense to create this summarizing ticket, about describing again what happens when the observation queue hits the limit - and eventually about how this can be improved
> Consider the following scenario (also compare with OAK-2587 - but that one focused more on eagerness of threading):
> * rate of incoming commits is large and starts to generate many changes into the observation queues, hence those queue become somewhat filled/loaded
> * depending on the underlying nodestore used the calculation of diffs is more or less expensive - but at least for mongomk it is important that the diff can be served from the cache
> ** in case of mongomk it can happen that diffs are no longer found in the cache and thus require a round-trip to mongo - which is magnitudes slower than via cache of course. this would result in the queue to start increasing even faster as dequeuing becomes slower now.
> ** not sure about tarmk - I believe it should always be fast there
> * so based on the above, there can be a situation where the queue grows and hits the configured limit
> * if this limit is reached, the current mechanism is to collapse any subsequent change into one-big-marked-as-external-event change, lets call this a collapsed-change.
> * this collapsed-change now becomes part of the normal queue and eventually would 'walk down the queue' and be processed normally - hence opening a high chance that yet a new collapsed-change is created should the queue just hit the limit again. and this game can now be played for a while, resulting in the queue to contain many/mostly such collapse-changes.
> * there is now an additional assumption in that the diffing of such collapses is more expensive than normal diffing - plus it is almost guaranteed that the diff cannot for example be shared between observation listeners, since the exact 'collapse borders' depends on timing of each of the listeners' queues - ie the collapse diffs are unique thus not cachable..
> * so as a result: once you have those collapse-diffs you can almost not get rid of them - they are heavy to process - hence dequeuing is very slow
> * at the same time, there is always likely some commits happening in a typical system, eg with sling on top you have sling discovery which does heartbeats every now and then. So there's always new commits that add to the load.
> * this will hence create a situation where quite a small additional commit rate can keep all the queues filled - due to the fact that the queue is full with 'heavy collapse diffs' that have to be calculated for each and every listener (of which you could have eg 150-200) individually.
> So again, possible solutions for this:
> * OAK-2669: tune diffing via persistent cache
> * OAK-2587: have more threads to remain longer 'in the cache zone'
> * tune your input speed explicitly to avoid filling the observation queues (this would be specific to your use-case of course, but can be seen as explicitly throttling on the input side)
> * increase the relevant caches to the max
> * but I think we will come up with yet a broader improvement of this observation queue limit problem by either
> ** doing flow control - eg via the commit rate limiter (also see OAK-1659)
> ** moving out handling of observation changes to a messaging subsystem - be it to handle local events only (since handling external events makes the system problematic wrt scalability if not done right) - also see [corresponding suggestion on dev list|http://markmail.org/message/b5trr6csyn4zzuj7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)