You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Michael Dürig (JIRA)" <ji...@apache.org> on 2018/12/05 16:53:00 UTC

[jira] [Commented] (OAK-7932) A distributed segment store for the cloud

    [ https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16710337#comment-16710337 ] 

Michael Dürig commented on OAK-7932:
------------------------------------

Interesting... I think this could align nicely with some idea [~frm] brought up earlier re. a "segment service".

Re. special handling for "root" segments: I can't quite follow your code to that respect. Also I think this can't be achieved without refactoring some of the existing code. Currently there is no concept of root segments. All segments get their share of root records and since the root changes on every commit the root segment also changes very frequently. We could try to always write root records to separate segments to introduce such a concept. However, this would require changes in the segment writer pool.

> A distributed segment store for the cloud
> -----------------------------------------
>
>                 Key: OAK-7932
>                 URL: https://issues.apache.org/jira/browse/OAK-7932
>             Project: Jackrabbit Oak
>          Issue Type: Wish
>          Components: segment-tar
>            Reporter: Axel Hanikel
>            Assignee: Axel Hanikel
>            Priority: Minor
>
> h1. Outline
> This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
>  distributed environment. The main idea is to adopt an actor-like model, meaning:
> -   Communication between actors (services) is done exclusively via messages.
>  -   An actor (which could also be a thread) processes one message at a time, avoiding sharing
>      state with other actors as far as possible.
>  -   Segments are kept in RAM and are written to external storage lazily only for disaster recovery.
>  -   As RAM is a very limited resource, different actors own their share of the total segment space.
>  -   An actor can also cache a few segments which it does not own but which it uses often (such as
>      the one containing the root node)
>  -   The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment
>     size would improve performance.
>  -   We could even use the segment solely as an addressing component and operate at the record level.
>      That would avoid copying data around when collecting garbage: garbage records would just be
>      evicted from RAM.
> h1. Implementation
> The first idea was to use ZeroMQ for communication because it seems to be a high-quality and
>  easy to use implementation. A major drawback is that the library is written in C and the Java
>  library which does the JNI stuff seems hard to set up and did not work for me. There is a native
>  Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far,
> but I don't know about its performance yet.
> There is an attempt to use jeromq in the segment store in a very very very early stage at
> [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on the memory segment store
> and currently just replaces direct function calls for reading and writing segments with messages being
> sent and received.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)