You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Axel Hanikel (Jira)" <ji...@apache.org> on 2020/09/21 08:59:00 UTC
[jira] [Commented] (OAK-7932) A distributed node store for the cloud

    [ https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199265#comment-17199265 ] 

Axel Hanikel commented on OAK-7932:
-----------------------------------

h2. Current status
The ZeroMQ NodeStore POC at https://github.com/ahanikel/jackrabbit-oak/tree/zeromq-nodestore implements the idea of a Document-/NodeState-oriented node store with its own simple blob store backend instead of using mongo. However, it is currently 2 - 3 times slower than the SegmentStore (or more when the caches are empty).

In order to get better performance we could now try to get the best of both worlds and store the NodeStates in segments, but only using the node, map, and list record types, and clearing the deduplication cache after writing a segment.

I currently think it is better to write just a single node state per segment rather than more than one but there is room for experiments here.

> A distributed node store for the cloud
> --------------------------------------
>
>                 Key: OAK-7932
>                 URL: https://issues.apache.org/jira/browse/OAK-7932
>             Project: Jackrabbit Oak
>          Issue Type: Wish
>          Components: segment-tar
>            Reporter: Axel Hanikel
>            Assignee: Axel Hanikel
>            Priority: Minor
>
> h1. Outline
> This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
>  distributed environment. The main idea is to adopt an actor-like model, meaning:
> *   Communication between actors (services) is done exclusively via messages.
> *   An actor (which could also be a thread) processes one message at a time, avoiding sharing
>      state with other actors as far as possible.
> *   Nodestates are kept in RAM and are written to external storage lazily only for disaster recovery.
> *   A nodestate is identified by a uuid, which in turn is a hash on its serialised string representation.
> *   As RAM is a very limited resource, different actors own their share of the total uuid space.
> *   An actor might also cache a few nodestates which it does not own but which it uses often (such as
>      the one containing the root node)
> h1. Implementation
> The first idea was to use the segment node store, and ZeroMQ for communication because it seems to be a high-quality and
>  easy to use implementation. A major drawback is that the library is written in C and the Java
>  library which does the JNI stuff seems hard to set up and did not work for me. There is a native
>  Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far. This approach is probably not
>  pursued because due to the nature of how things are stored in segments, they are hard to cache (it seems like a large part of the repository
> will eventually end up in the cache).
> A second implementation, at [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq-nodestore] is a simple
> nodestore implementation which is kind of a dual to the segment store in the sense that it is on the other end
> of the compactness spectrum. The segment store is very dense and avoids duplication whereever possible.
> The nodestore in this implementation, however, is quite redundant: Every nodestate gets its own UUID (a hash of the serialised
> nodestate) and is saved together with its properties, similar to the document node store.
> Here is what a serialised nodestate looks like:
> {noformat}
> begin ZeroMQNodeState
> begin children
> allow	856d1356-7054-3993-894b-a04426956a78
> end children
> begin properties
> jcr:primaryType <NAME> = rep:ACL
> :childOrder <NAMES> = [allow]
> end properties
> end ZeroMQNodeState
> {noformat}
> This verbose format is good for debugging but expensive to generate and parse, so it may be replaced with a binary format at some point. But it shows how child nodestates are referenced and how properties are represented. Binary properties (not shown here) are represented by a reference to the blob store.
> The redundancy (compared with the segment store with its fine-grained record structure) wastes space, but on the other hand garbage
> collection (yet unimplemented) is easier because there is no segment that needs to be rewritten to get rid of data that is no
> longer referenced; unreferenced nodes can just be deleted. This implementation still has bugs, but being much simpler
> than the segment store, it can eventually be used to experiment with different configurations and examine their
> performance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)