You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Axel Hanikel (JIRA)" <ji...@apache.org> on 2018/12/03 12:35:00 UTC
[jira] [Updated] (OAK-7932) A distributed segment store for the
cloud
[ https://issues.apache.org/jira/browse/OAK-7932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Axel Hanikel updated OAK-7932:
------------------------------
Description:
h1. Outline
This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
distributed environment. The main idea is to adopt an actor-like model, meaning:
- Communication between actors (services) is done exclusively via messages.
- An actor (which could also be a thread) processes one message at a time, avoiding sharing
state with other actors as far as possible.
- Segments are kept in RAM and are written to external storage lazily only for disaster recovery.
- As RAM is a very limited resource, different actors own their share of the total segment space.
- An actor can also cache a few segments which it does not own but which it uses often (such as
the one containing the root node)
- The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment
size would improve performance.
- We could even use the segment solely as an addressing component and operate at the record level.
That would avoid copying data around when collecting garbage: garbage records would just be
evicted from RAM.
h1. Implementation
The first idea was to use ZeroMQ for communication because it seems to be a high-quality and
easy to use implementation. A major drawback is that the library is written in C and the Java
library which does the JNI stuff seems hard to set up and did not work for me. There is a native
Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far,
but I don't know about its performance yet.
There is an attempt to use jeromq in the segment store in a very very very early stage at
[https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on the memory segment store
and currently just replaces direct function calls for reading and writing segments with messages being
sent and received.
was:
# Outline
This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
distributed environment. The main idea is to adopt an actor-like model, meaning:
- Communication between actors (services) is done exclusively via messages.
- An actor (which could also be a thread) processes one message at a time, avoiding sharing
state with other actors as far as possible.
- Segments are kept in RAM and are written to external storage lazily only for disaster recovery.
- As RAM is a very limited resource, different actors own their share of the total segment space.
- An actor can also cache a few segments which it does not own but which it uses often (such as
the one containing the root node)
- The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment size would improve performance.
- We could even use the segment solely as an addressing component and operate at the record level.
That would avoid copying data around when collecting garbage: garbage records would just be
evicted from RAM.
# Implementation
The first idea was to use ZeroMQ for communication because it seems to be a high-quality and
easy to use implementation. A major drawback is that the library is written in C and the Java
library which does the JNI stuff seems hard to set up and did not work for me. There is a native
Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far, but I don't know about its performance yet.
There is an attempt to use jeromq in the segment store in a very very very early stage at
<https://github.com/ahanikel/jackrabbit-oak/tree/zeromq> . It is based on the memory segment store and currently just replaces direct function calls for reading and writing segments with messages being sent and received.
> A distributed segment store for the cloud
> -----------------------------------------
>
> Key: OAK-7932
> URL: https://issues.apache.org/jira/browse/OAK-7932
> Project: Jackrabbit Oak
> Issue Type: Wish
> Components: segment-tar
> Reporter: Axel Hanikel
> Assignee: Axel Hanikel
> Priority: Minor
>
> h1. Outline
> This issue documents some proof-of-concept work for adapting the segment tar nodestore to a
> distributed environment. The main idea is to adopt an actor-like model, meaning:
> - Communication between actors (services) is done exclusively via messages.
> - An actor (which could also be a thread) processes one message at a time, avoiding sharing
> state with other actors as far as possible.
> - Segments are kept in RAM and are written to external storage lazily only for disaster recovery.
> - As RAM is a very limited resource, different actors own their share of the total segment space.
> - An actor can also cache a few segments which it does not own but which it uses often (such as
> the one containing the root node)
> - The granularity of operating on whole segments may be too coarse, so perhaps reducing the segment
> size would improve performance.
> - We could even use the segment solely as an addressing component and operate at the record level.
> That would avoid copying data around when collecting garbage: garbage records would just be
> evicted from RAM.
> h1. Implementation
> The first idea was to use ZeroMQ for communication because it seems to be a high-quality and
> easy to use implementation. A major drawback is that the library is written in C and the Java
> library which does the JNI stuff seems hard to set up and did not work for me. There is a native
> Java implementation of the ZeroMQ protocol, aptly called jeromq, which seems to work well so far,
> but I don't know about its performance yet.
> There is an attempt to use jeromq in the segment store in a very very very early stage at
> [https://github.com/ahanikel/jackrabbit-oak/tree/zeromq] . It is based on the memory segment store
> and currently just replaces direct function calls for reading and writing segments with messages being
> sent and received.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)