You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Axel Hanikel (Jira)" <ji...@apache.org> on 2020/06/29 13:37:00 UTC
[jira] [Updated] (OAK-7932) A distributed node store for the cloud

     [ https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Axel Hanikel updated OAK-7932:
------------------------------
    Summary: A distributed node store for the cloud  (was: A distributed segment store for the cloud)

> A distributed node store for the cloud
> --------------------------------------
>
>                 Key: OAK-7932
>                 URL: https://issues.apache.org/jira/browse/OAK-7932
>             Project: Jackrabbit Oak
>          Issue Type: Wish
>          Components: segment-tar
>            Reporter: Axel Hanikel
>            Assignee: Axel Hanikel
>            Priority: Minor
>
> h1. Outline
> This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
>  distributed environment. The main idea is to adopt an actor-like model, meaning:
> -   Communication between actors (services) is done exclusively via messages.
>  -   An actor (which could also be a thread) processes one message at a time, avoiding sharing
>      state with other actors as far as possible.
>  -   Segments are kept in RAM and are written to external storage lazily only for disaster recovery.
>  -   As RAM is a very limited resource, different actors own their share of the total segment space.
>  -   An actor can also cache a few segments which it does not own but which it uses often (such as
>      the one containing the root node)
>  -   The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment
>      size would improve performance.
>  -   We could even use the segment solely as an addressing component and operate at the record level.
>      That would avoid copying data around when collecting garbage: garbage records would just be
>      evicted from RAM.
> h1. Implementation
> The first idea was to use ZeroMQ for communication because it seems to be a high-quality and
>  easy to use implementation. A major drawback is that the library is written in C and the Java
>  library which does the JNI stuff seems hard to set up and did not work for me. There is a native
>  Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far,
>  but I don't know about its performance yet.
> A second implementation, at [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq-nodestore] is a simple
> nodestore implementation which is kind of a dual to the segment store in the sense that it is on the other end
> of the compactness spectrum. The segment store is very dense and avoids duplication whereever possible.
> The nodestore in this implementation, however, is quite redundant: Every nodestate gets its own UUID (a hash of the serialised
> nodestate) and is saved together with its properties, similar to the document node store.
> This redundancy wastes space, but on the other hand garbage
> collection (yet unimplemented) is easier because there is no segment that needs to be rewritten to get rid of data that is no
> longer referenced; unreferenced nodes can just be deleted. This implementation still has bugs, but being much simpler
> than the segment store, it can eventually be used to experiment with different configurations and examine their
> performance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)