You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2017/03/30 08:52:41 UTC

[jira] [Commented] (OAK-4581) Persistent local journal for more reliable event generation

    [ https://issues.apache.org/jira/browse/OAK-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948690#comment-15948690 ] 

Stefan Egli commented on OAK-4581:
----------------------------------

This might have been discussed before, but I'd like to bring it up again: alternatively to designs suggested in the history of this ticket we could also turn this thing around entirely and rather support _persisting_ via implementing [JCR 12.6.'s journaled observation/EventJjournal|https://docs.adobe.com/content/docs/en/spec/jcr/2.0/12_Observation.html]:
Having underlying journaled observation would allow to:
* switch between a 'live listener' to a 'journaled listener' once the queue is full (ie we could support a hard queue limit, no compaction of events)
* this switch could be implemented in a way that guarantees no loss of events nor any duplicate events (via careful use of the {{skipTo}} method - perhaps with the help of a non-standard _revision_ based {{skipTo}} variant)
* once the listener reaches 'the end of the (journaled) events' we switch it back to a live listener

The advantages are that we don't have to worry about storing anything additionally - don't have to define any file format for these persisted events and maintain them. This is all delegated to the EventJournal. 
Note that the EventJournal itself can be implemented based on existing data too, such as the existing journal of the DocumentNodeStore. (For SegmentNodeStore it might be different since its journal is less precise, only done every couple seconds).

The disadvantages are that the EventJournal is (probably, tbd) not based on _events_ but based on raw revision (commit) information. Thus events would have to yet be generated for the listeners when needed. That means it would have to do the diff again. That means it could be slower as the diff might no longer be in the cache.
Also, the EventJournal would have to at least go as much back in time as the slowest listener.

So overall perhaps a summary is that using EventJournal for slow listeners could be easier to implement, thus easier in terms of code maintenance, but also easier in terms of operational maintenance with the downside of potentially being less performant.

> Persistent local journal for more reliable event generation
> -----------------------------------------------------------
>
>                 Key: OAK-4581
>                 URL: https://issues.apache.org/jira/browse/OAK-4581
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core
>            Reporter: Chetan Mehrotra
>            Assignee: Stefan Egli
>              Labels: observation
>             Fix For: 1.8
>
>         Attachments: OAK-4581.v0.patch
>
>
> As discussed in OAK-2683 "hitting the observation queue limit" has multiple drawbacks. Quite a bit of work is done to make diff generation faster. However there are still chances of event queue getting filled up. 
> This issue is meant to implement a persistent event journal. Idea here being
> # NodeStore would push the diff into a persistent store via a synchronous observer
> # Observors which are meant to handle such events in async way (by virtue of being wrapped in BackgroundObserver) would instead pull the events from this persisted journal
> h3. A - What is persisted
> h4. 1 - Serialized Root States and CommitInfo
> In this approach we just persist the root states in serialized form. 
> * DocumentNodeStore - This means storing the root revision vector
> * SegmentNodeStore - {color:red}Q1 - What does serialized form of SegmentNodeStore root state looks like{color} - Possible the RecordId of "root" state
> Note that with OAK-4528 DocumentNodeStore can rely on persisted remote journal to determine the affected paths. Which reduces the need for persisting complete diff locally.
> Event generation logic would then "deserialize" the persisted root states and then generate the diff as currently done via NodeState comparison
> h4. 2 - Serialized commit diff and CommitInfo
> In this approach we can save the diff in JSOP form. The diff only contains information about affected path. Similar to what is current being stored in DocumentNodeStore journal
> h4. CommitInfo
> The commit info would also need to be serialized. So it needs to be ensure whatever is stored there can be serialized or re calculated
> h3. B - How it is persisted
> h4. 1 - Use a secondary segment NodeStore
> OAK-4180 makes use of SegmentNodeStore as a secondary store for caching. [~mreutegg] suggested that for persisted local journal we can also utilize a SegmentNodeStore instance. Care needs to be taken for compaction. Either via generation approach or relying on online compaction
> h4. 2- Make use of write ahead log implementations
> [~ianeboston] suggested that we can make use of some write ahead log implementation like [1], [2] or [3]
> h3. C - How changes get pulled
> Some points to consider for event generation logic
> # Would need a way to keep pointers to journal entry on per listener basis. This would allow each Listener to "pull" content changes and generate diff as per its speed and keeping in memory overhead low
> # The journal should survive restarts
> [1] http://www.mapdb.org/javadoc/latest/mapdb/org/mapdb/WriteAheadLog.html
> [2] https://github.com/apache/activemq/tree/master/activemq-kahadb-store/src/main/java/org/apache/activemq/store/kahadb/disk/journal
> [3] https://github.com/elastic/elasticsearch/tree/master/core/src/main/java/org/elasticsearch/index/translog



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)