You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2017/01/05 11:00:15 UTC

Re: [jr3 microkernel] Change log consolidation

Hi,

Coming back to this after 5 years, yay!

Turns out that mentioned approach is nowadays called "patch commutation" 
[1, 2] if we regard change logs as patches, that is. It is used as 
foundation of the darcs version control system [3].

The way Oak rebases branches on commit [4] is actually the same thing in 
guise: commutation of patches. Only that Oak does not consider the 
commutative properties of move operations, which the original proposal 
below did.

The reason I'm bringing this up now is that this topic might turn out 
helpful again once we start talking about conflict reconciliation under 
network partition (e.g. geo-replication). As Alexander Klimetschek 
already pointed out in the discussions 5 years back BTW!

Michael
	

[1] https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory
[2] 
https://byorgey.wordpress.com/2008/02/13/patch-theory-part-ii-some-basics/
[3] http://darcs.net/
[4] 
https://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches


On 3.2.12 7:44 , Michael Drig wrote:
>
> Hi,
>
> While working on a new transient space implementation [1] I discovered
> that recreating a list of changes from transient modifications to the
> hierarchy is like opening a can of worms. Rethinking things it turned
> out, that it is much simpler to directly consolidate the change log.
> This approach is much more flexible and versatile since knowledge of the
> persisted hierarchy is not necessary at all. Thus this approach should
> work equally well with change logs for jr3, the Microkernel, Jackrabbit
> core, JCR and SPI.
>
> The algorithm is capable of creating minimal change logs from any valid
> change log (that is from any change log that can be applied to some
> hierarchy).
>
> I extensively tested the algorithm on randomly created change logs [3].
> This resulted in some interesting numbers: consolidation results in an
> decrease of about 45% of these randomly generated change logs.
> Furthermore the resulting size of the H2 database files *decreased up to
> 200%*!
>
> The implementation [2] is currently part of the jackrabbit-microkernel
> module in the sandbox. However it does not have any crucial dependencies
> so it should be easy to adapt for other usages.
>
> The gory details of the algorithm are based on some algebraic properties
> of move operations on paths as detailed in the following paragraphs.
>
> The change log consolidation algorithm implemented in the reduce method
> [2] is based on algebraic properties of move operations on paths. The
> other operations (add node, remove node and set property) are
> generalized to move operations. Consolidation relies on reduction and
> commutation rules of move operations.
>
> A move operation resembles a map on a hierarchy (of nodes and
> properties). A change log consisting of k move operations m_1 to m_k is
> thus the composition of the individual moves: m_1 *, ..., * m_k (using *
> for function composition: f(g(x)) = (f * g)(x)).
>
> Some definitions, notation and propositions:
>
> * Let NIL denote a path which never occurs in any hierarchy.
>
> * Order on paths: let p, q be paths.
>   - p < q iff p != NIL and q != NIL and p is an ancestor of q.
>   - p <= q iff p < q or p == q != NIL
>
> * Conflict of paths: let p, q be paths.
>   - p ~ q (p conflicts with q) iff either p <= q or q <= p
>
> * Substitution in paths: let p, q, r be paths.
>   - [p -> q]r = r if p is not an ancestor of r and
>   - [p -> q]r = s where s is the path resulting from replacing
>     the ancestor p in r with q otherwise.
>
> * Let p, q be paths. Then p:q denotes a move operation where the
>   node at p is moved to a new node at q.
>
> * Valid moves: leq p, q be paths.
>   - p:q is valid iff p !~ q or p = q
>   - if p:q is not valid, it is invalid
>
> Invalid moves are exactly those which would result in a node being moved
> to an ancestor/descendant of itself.
>
> * Identity on moves: let p, q be paths.
>   - p:q = ID iff p = q.
>
> * Conflict on moves: let p, q, r, s be paths.
>   - p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
>     or q ~ r or q ~ s.
>
> * Strict commutativity of moves: let m, n be moves.
>   - m * n = n * m iff m !~ n
>
> * Substitutions in moves: let p, q, r, s be paths.
>   - [p -> q]r:s = [p -> q]r:[p -> q]s
>
> * Let p be a path and let +p denote an add node operation and let -p
>   denote a remove node operation for a node at path p.
>   - +p = NIL:p That is, adding a node is represented by a move from a
>     unknown source.
>   - p = p:NIL. That is, removing a node is represented by a move to an
>     unknown sink.
>
>
> Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID
> and n != ID. Then the following reduction and commutation rules apply:
>
> 1.  p!~ r:  m * n = n * m
> 2.  p < r:  illegal (since this implies q <= r which implies p ~ q and
>             thus m invalid)
> 3.  p = r:  illegal (since this implies q <= r which implies p ~ q and
>             this m invalid)
> 4.  p > r:  does not commute if q < s. Otherwise m * n = n * [r -> s]m
> 5.  p!~ s:  m * n = n * m
> 6.  p < s:  illegal (since this implies p ~ q and thus m invalid)
> 7.  p = s:  does not commute
> 8.  p > s:  illegal (since p > s implies there is an s already which
>             will conflict with r:s)
> 9.  q!~ r:  m * n = n * m
> 10. q < r:  m * n = [q -> p]n * m
> 11. q = r:  m * n = p:s (transitivity of moves)
> 12. q > r:  m * n = n * [r -> s]m
> 13. q!~ s:  m * n = n * m
> 14. q < s:  does not commute if p > r. Otherwise m * n = [q -> p]n * m
> 15. q = s:  illegal (since s conflicts with r:s)
> 16. q > s:  illegal (since s conflicts with r:s)
>
> Allowing add node and remove node operations the following additional
> conditions apply:
>
> Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the
> reduction and commutations rules 1. to 16. apply with extra conditions
> on 4., 10., 12. and 14.:
>
> 4'.  if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
>      m, n do not commute.
> 10'. illegal if p = NIL
> 12'. if s = NIL then m * n = -r * -p
> 14'. illegal if p = NIL
>
> Michael
>
>
> [1] http://markmail.org/message/qkkcvtmtapas2cx4
> [2]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup
>
> [3]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup
>