You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Timothée Maret <ti...@gmail.com> on 2015/06/04 10:31:23 UTC

CRDTs in Oak

Hi,

I stumbled upon the CRDT concept [0,1] of data types offering strong
eventual consistency.
I am thinking such data structures could be really useful for apps wanting
to manage huge collections (Set, Map, etc.) but can't afford to linearise
writes.

At Adobe we have such use cases for cross region login infrastructure or
group management applications.

AFAIK, the idea of CRDT is to offer data types that ultimately converge to
the the same value from all observers.
The data structure is structured in such a way that changes to its state
can be done in any order which avoid mineralization yet guarantee
consistency (ultimately).

Looking at Oak, it seems it offers the bases for implementing such CRDTs.
AFAIK, Oak could be tweaked in such a way that conflicts are automatically
resolved according to specific merging rules.

I have read the presentation in [2] and I was wondering if one of those
mode of conflict handling would help to implement CRDTs.
Knowing more about configuration of conflict handling was the intend of my
question in [4].

I have seen Riak [3] is supporting them and we could do the same for Oak
(release a library of CRDTs that leverages Oak).

The added value of this library would be to offer general structures that
are easy to reason about consistency wise and would be bug-free.

Since Oak is a tree, we could likely implement a CRDT Set nicely. From the
CRDT Set we could implement a Map, a graph, and so on.

Would it make sense to investigate CRDTs for Oak ? It may make a really
interesting student project (Google Summer of Code or such).

wdyt ?


Regards,

Timothee


[0] http://hal.upmc.fr/file/index/docid/555588/filename/techreport.pdf
[1] http://arxiv.org/pdf/1201.1784v1.pdf
[2] http://www.slideshare.net/MichaelDrig/oak-39377061?related=1
[3] http://docs.basho.com/riak/2.0.2/dev/using/data-types/
[4]
https://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201505.mbox/browser

Re: CRDTs in Oak

Posted by Michael Dürig <md...@apache.org>.
Hi Timothee,

I read the paper by Mark Shapiro only a couple of month ago while I was 
thinking about generalising the concept of atomic counters we started to 
introduce for Oak [1].

To see how these ideas would apply to Oak I POCed some of them (atomic 
counter, last writer wins, multi value register, atomic set). Those 
nicely show the capabilities and mechanisms of Oak for handling (and 
avoiding) various type of conflicts. I plan to do a session on this at 
this year's .adatpTo conference in Berlin [2].

On a more theoretical note it seems that most of Shapiro's data 
structures become much simpler on Oak. This is somewhat expected as 
Oak's consistency model is much stronger (sequential consistency) than 
the one used in Shapiros work. AFAIR he uses a gossip protocol where 
independent cluster nodes can receive updates from each other at any time.

OTOH CRDTs have their limitations. The requirements that in the state 
based style successive states form a semi-lattice where merge operations 
correspond to the least upper bound of the involved states is quite a 
strong one. Dually the operations based style requires operations to be 
commutative, which is a very strict requirement. The fact that such a 
basic data structures as a general set (with add and remove operations) 
cannot be implemented as a CRDT (because add and remove do not commute) 
shows how strict these requirements actually are.
In a way this is what we intuitively knew all the time: if the order of 
operations doesn't matter applying incoming messages is trivial. Or 
alternatively if for any two states we can find a state that is the 
successor of both, we know how to merge the initial states. Shapiro's 
biggest contribution IMO is to state this so clearly in a rigorous and 
formal manner. This is a great basis for reasoning about existing data 
structures and about which bit of functionality to rip out to get 
consistency back.

Anyway, my implementations so far are pretty basic. Also I didn't bother 
with the more sophisticated data structures like graphs and maps. So 
this might be a good starting point for a student's project.

Michael


[1] https://issues.apache.org/jira/browse/OAK-2220
[2] 
https://wiki.day.com/content/wiki/Users/mduerig/Avoiding%20and%20dealing%20with%20conflicting%20updates%20in%20Oak.html



On 4.6.15 10:31 , Timothée Maret wrote:
> Hi,
>
> I stumbled upon the CRDT concept [0,1] of data types offering strong
> eventual consistency.
> I am thinking such data structures could be really useful for apps wanting
> to manage huge collections (Set, Map, etc.) but can't afford to linearise
> writes.
>
> At Adobe we have such use cases for cross region login infrastructure or
> group management applications.
>
> AFAIK, the idea of CRDT is to offer data types that ultimately converge to
> the the same value from all observers.
> The data structure is structured in such a way that changes to its state
> can be done in any order which avoid mineralization yet guarantee
> consistency (ultimately).
>
> Looking at Oak, it seems it offers the bases for implementing such CRDTs.
> AFAIK, Oak could be tweaked in such a way that conflicts are automatically
> resolved according to specific merging rules.
>
> I have read the presentation in [2] and I was wondering if one of those
> mode of conflict handling would help to implement CRDTs.
> Knowing more about configuration of conflict handling was the intend of my
> question in [4].
>
> I have seen Riak [3] is supporting them and we could do the same for Oak
> (release a library of CRDTs that leverages Oak).
>
> The added value of this library would be to offer general structures that
> are easy to reason about consistency wise and would be bug-free.
>
> Since Oak is a tree, we could likely implement a CRDT Set nicely. From the
> CRDT Set we could implement a Map, a graph, and so on.
>
> Would it make sense to investigate CRDTs for Oak ? It may make a really
> interesting student project (Google Summer of Code or such).
>
> wdyt ?
>
>
> Regards,
>
> Timothee
>
>
> [0] http://hal.upmc.fr/file/index/docid/555588/filename/techreport.pdf
> [1] http://arxiv.org/pdf/1201.1784v1.pdf
> [2] http://www.slideshare.net/MichaelDrig/oak-39377061?related=1
> [3] http://docs.basho.com/riak/2.0.2/dev/using/data-types/
> [4]
> https://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201505.mbox/browser
>