You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2013/10/17 14:16:43 UTC

Detecting move operations in node state diffs

Hi,

Currently we can't detect a move operation through diffing node states. 
Those operation are currently seen as separate remove and add operations 
that can't be easily associated with each other. This impacts permission 
evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which 
both don't have the same support for moves as had Jackrabbit 2.

As discussed several times before it is not possible to regain move 
operation from simply diffing node states. We need additional 
information. One option is to annotate nodes (*) as they are moved with 
their source path. With that we could detect whether an added node was 
the target of a move operation and if so where the source of that 
operation was. However, this comes with a performance penalty since such 
a diff operation could not be done in a single pass any more. In order 
to decide whether a deleted node has been moved, the corresponding add 
needs to be found first. In essence this requires the diff operation to 
do two passes: the first one for detecting move operations and the 
second one for the other operations.

To avoid the second pass, we could also remember the paths of the moved 
nodes in a global place (*). This would allow us to look up whether a 
deleted node was moved (opposed to deleted) as we go and detect moved 
nodes as soon as we come across an added node that has a source path 
annotation. As an added benefit this approach allows us to detect 
whether there was a move at all simply by checking whether there are 
entries in this global place. If this is not the case, we could fall 
back to a simpler diff mechanism.

(*) All such annotations would happen as hidden items in transient space 
and would have to be removed again by some hook before persisting.

WDYT, is this worth the trouble?

Michael

Re: Detecting move operations in node state diffs

Posted by Tommaso Teofili <to...@gmail.com>.
>From the user point of view I think this makes sense and it'd be worth the
effort, also it looks like the global space approach would allow to do that
without loosing too much in terms of performance.

My 2 cents,
Tommaso


2013/10/17 Michael Dürig <md...@apache.org>

>
> Hi,
>
> Currently we can't detect a move operation through diffing node states.
> Those operation are currently seen as separate remove and add operations
> that can't be easily associated with each other. This impacts permission
> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
> both don't have the same support for moves as had Jackrabbit 2.
>
> As discussed several times before it is not possible to regain move
> operation from simply diffing node states. We need additional information.
> One option is to annotate nodes (*) as they are moved with their source
> path. With that we could detect whether an added node was the target of a
> move operation and if so where the source of that operation was. However,
> this comes with a performance penalty since such a diff operation could not
> be done in a single pass any more. In order to decide whether a deleted
> node has been moved, the corresponding add needs to be found first. In
> essence this requires the diff operation to do two passes: the first one
> for detecting move operations and the second one for the other operations.
>
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*). This would allow us to look up whether a
> deleted node was moved (opposed to deleted) as we go and detect moved nodes
> as soon as we come across an added node that has a source path annotation.
> As an added benefit this approach allows us to detect whether there was a
> move at all simply by checking whether there are entries in this global
> place. If this is not the case, we could fall back to a simpler diff
> mechanism.
>
> (*) All such annotations would happen as hidden items in transient space
> and would have to be removed again by some hook before persisting.
>
> WDYT, is this worth the trouble?
>
> Michael
>

Re: Detecting move operations in node state diffs

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Oct 17, 2013 at 8:16 AM, Michael Dürig <md...@apache.org> wrote:
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*).

Perhaps the new CommitInfo?

I assume we'd only need to track moves for local commits as that's
where permission evaluation is done, and as our support for observing
external events in any case is limited.

BR,

Jukka Zitting

Re: Detecting move operations in node state diffs

Posted by Michael Dürig <md...@apache.org>.

On 21.10.13 8:23 , Michael Dürig wrote:
>> IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
>> and node moved. but maybe I'm wrong, but I thought we kept this for
>> backward compatibility.
>
> Thanks for the heads up. Will double check. This would in fact simplify
> event generation in the observation code.

Double checked and yes, this is the case: Jackrabbit sends 3 events per 
move: NODE_ADDED, NODE_REMOVED and NODE_MOVED.

Michael

Re: Detecting move operations in node state diffs

Posted by Michael Dürig <md...@apache.org>.

On 21.10.13 7:18 , Tobias Bocanegra wrote:
> On Mon, Oct 21, 2013 at 7:31 AM, Thomas Mueller<mu...@adobe.com>  wrote:
>> >Hi,
>> >
>>> >>extra pass
>> >
>> >On how to avoid this extra pass. Not strictly backward compatible, but I
>> >wonder how much it would break: what if observation would deliver two
>> >events for moved nodes: the "node moved" event (added at the target), plus
>> >the "node deleted" event (deleted at the source)? The one use case I know
>> >about, data store garbage collection in Jackrabbit core, would be OK with
>> >this behavior.
> IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
> and node moved. but maybe I'm wrong, but I thought we kept this for
> backward compatibility.

Thanks for the heads up. Will double check. This would in fact simplify 
event generation in the observation code.

Michael


Re: Detecting move operations in node state diffs

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

>IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
>and node moved.

Cool, I didn't know this :-)

Regards,
Thomas


Re: Detecting move operations in node state diffs

Posted by Tobias Bocanegra <tr...@apache.org>.
On Mon, Oct 21, 2013 at 7:31 AM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
>> extra pass
>
> On how to avoid this extra pass. Not strictly backward compatible, but I
> wonder how much it would break: what if observation would deliver two
> events for moved nodes: the "node moved" event (added at the target), plus
> the "node deleted" event (deleted at the source)? The one use case I know
> about, data store garbage collection in Jackrabbit core, would be OK with
> this behavior.
IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
and node moved. but maybe I'm wrong, but I thought we kept this for
backward compatibility.

regards, toby


>
> Regards,
> Thomas
>
>
>
> On 10/21/13 2:20 PM, "Michael Dürig" <md...@apache.org> wrote:
>
>>
>>Hi,
>>
>>I implemented a very rough POC of the algorithm outlined below. See [1]
>>for the implementation itself. On move a node is annotated with its
>>source path in NodeBuilder.moveTo(). Later moves can be extracted
>>through the standalone MoveDetector class. See MoveDetectorTest for
>>details. MoveDetector also provides a static utility method
>>findMovedPaths for building the set of moved nodes the algorithm
>>requires. As mentioned below this extra pass is not required if this set
>>can be obtained by other means.
>>
>>See [2] how this could be integrated with the current observation
>>implementation.
>>
>>If we deicide to go with such an approach at all, we still need to
>>figure out how to better integrate it with the current node state diff.
>>
>>Michael
>>
>>
>>[1]
>>https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b
>>18038eec11f315a
>>[2]
>>https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e
>>44c27ad34292b88
>>
>>
>>On 17.10.13 2:16 , Michael Dürig wrote:
>>>
>>> Hi,
>>>
>>> Currently we can't detect a move operation through diffing node states.
>>> Those operation are currently seen as separate remove and add operations
>>> that can't be easily associated with each other. This impacts permission
>>> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>>> both don't have the same support for moves as had Jackrabbit 2.
>>>
>>> As discussed several times before it is not possible to regain move
>>> operation from simply diffing node states. We need additional
>>> information. One option is to annotate nodes (*) as they are moved with
>>> their source path. With that we could detect whether an added node was
>>> the target of a move operation and if so where the source of that
>>> operation was. However, this comes with a performance penalty since such
>>> a diff operation could not be done in a single pass any more. In order
>>> to decide whether a deleted node has been moved, the corresponding add
>>> needs to be found first. In essence this requires the diff operation to
>>> do two passes: the first one for detecting move operations and the
>>> second one for the other operations.
>>>
>>> To avoid the second pass, we could also remember the paths of the moved
>>> nodes in a global place (*). This would allow us to look up whether a
>>> deleted node was moved (opposed to deleted) as we go and detect moved
>>> nodes as soon as we come across an added node that has a source path
>>> annotation. As an added benefit this approach allows us to detect
>>> whether there was a move at all simply by checking whether there are
>>> entries in this global place. If this is not the case, we could fall
>>> back to a simpler diff mechanism.
>>>
>>> (*) All such annotations would happen as hidden items in transient space
>>> and would have to be removed again by some hook before persisting.
>>>
>>> WDYT, is this worth the trouble?
>>>
>>> Michael
>

Re: Detecting move operations in node state diffs

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

> extra pass

On how to avoid this extra pass. Not strictly backward compatible, but I
wonder how much it would break: what if observation would deliver two
events for moved nodes: the "node moved" event (added at the target), plus
the "node deleted" event (deleted at the source)? The one use case I know
about, data store garbage collection in Jackrabbit core, would be OK with
this behavior.

Regards,
Thomas



On 10/21/13 2:20 PM, "Michael Dürig" <md...@apache.org> wrote:

>
>Hi,
>
>I implemented a very rough POC of the algorithm outlined below. See [1]
>for the implementation itself. On move a node is annotated with its
>source path in NodeBuilder.moveTo(). Later moves can be extracted
>through the standalone MoveDetector class. See MoveDetectorTest for
>details. MoveDetector also provides a static utility method
>findMovedPaths for building the set of moved nodes the algorithm
>requires. As mentioned below this extra pass is not required if this set
>can be obtained by other means.
>
>See [2] how this could be integrated with the current observation
>implementation.
>
>If we deicide to go with such an approach at all, we still need to
>figure out how to better integrate it with the current node state diff.
>
>Michael
>
>
>[1] 
>https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b
>18038eec11f315a
>[2] 
>https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e
>44c27ad34292b88
>
>
>On 17.10.13 2:16 , Michael Dürig wrote:
>>
>> Hi,
>>
>> Currently we can't detect a move operation through diffing node states.
>> Those operation are currently seen as separate remove and add operations
>> that can't be easily associated with each other. This impacts permission
>> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>> both don't have the same support for moves as had Jackrabbit 2.
>>
>> As discussed several times before it is not possible to regain move
>> operation from simply diffing node states. We need additional
>> information. One option is to annotate nodes (*) as they are moved with
>> their source path. With that we could detect whether an added node was
>> the target of a move operation and if so where the source of that
>> operation was. However, this comes with a performance penalty since such
>> a diff operation could not be done in a single pass any more. In order
>> to decide whether a deleted node has been moved, the corresponding add
>> needs to be found first. In essence this requires the diff operation to
>> do two passes: the first one for detecting move operations and the
>> second one for the other operations.
>>
>> To avoid the second pass, we could also remember the paths of the moved
>> nodes in a global place (*). This would allow us to look up whether a
>> deleted node was moved (opposed to deleted) as we go and detect moved
>> nodes as soon as we come across an added node that has a source path
>> annotation. As an added benefit this approach allows us to detect
>> whether there was a move at all simply by checking whether there are
>> entries in this global place. If this is not the case, we could fall
>> back to a simpler diff mechanism.
>>
>> (*) All such annotations would happen as hidden items in transient space
>> and would have to be removed again by some hook before persisting.
>>
>> WDYT, is this worth the trouble?
>>
>> Michael


Re: Detecting move operations in node state diffs

Posted by Michael Dürig <md...@apache.org>.
Hi,

I implemented a very rough POC of the algorithm outlined below. See [1] 
for the implementation itself. On move a node is annotated with its 
source path in NodeBuilder.moveTo(). Later moves can be extracted 
through the standalone MoveDetector class. See MoveDetectorTest for 
details. MoveDetector also provides a static utility method 
findMovedPaths for building the set of moved nodes the algorithm 
requires. As mentioned below this extra pass is not required if this set 
can be obtained by other means.

See [2] how this could be integrated with the current observation 
implementation.

If we deicide to go with such an approach at all, we still need to 
figure out how to better integrate it with the current node state diff.

Michael


[1] 
https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b18038eec11f315a
[2] 
https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e44c27ad34292b88


On 17.10.13 2:16 , Michael Dürig wrote:
>
> Hi,
>
> Currently we can't detect a move operation through diffing node states.
> Those operation are currently seen as separate remove and add operations
> that can't be easily associated with each other. This impacts permission
> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
> both don't have the same support for moves as had Jackrabbit 2.
>
> As discussed several times before it is not possible to regain move
> operation from simply diffing node states. We need additional
> information. One option is to annotate nodes (*) as they are moved with
> their source path. With that we could detect whether an added node was
> the target of a move operation and if so where the source of that
> operation was. However, this comes with a performance penalty since such
> a diff operation could not be done in a single pass any more. In order
> to decide whether a deleted node has been moved, the corresponding add
> needs to be found first. In essence this requires the diff operation to
> do two passes: the first one for detecting move operations and the
> second one for the other operations.
>
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*). This would allow us to look up whether a
> deleted node was moved (opposed to deleted) as we go and detect moved
> nodes as soon as we come across an added node that has a source path
> annotation. As an added benefit this approach allows us to detect
> whether there was a move at all simply by checking whether there are
> entries in this global place. If this is not the case, we could fall
> back to a simpler diff mechanism.
>
> (*) All such annotations would happen as hidden items in transient space
> and would have to be removed again by some hook before persisting.
>
> WDYT, is this worth the trouble?
>
> Michael

Re: Detecting move operations in node state diffs

Posted by Angela Schreiber <an...@adobe.com>.
IMHO it's worth trying to get a solution for this.

if we do, i would suggest that we detect the move before
passing the information down to the commit hooks. in other
words, there should be a method like nodeMoved in order to prevent
that every move-sensitive hook has to write the same code in order
to find out about potential moves.

kind regards
angela

On 10/17/13 2:16 PM, "Michael Dürig" <md...@apache.org> wrote:

>
>Hi,
>
>Currently we can't detect a move operation through diffing node states.
>Those operation are currently seen as separate remove and add operations
>that can't be easily associated with each other. This impacts permission
>evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>both don't have the same support for moves as had Jackrabbit 2.
>
>As discussed several times before it is not possible to regain move
>operation from simply diffing node states. We need additional
>information. One option is to annotate nodes (*) as they are moved with
>their source path. With that we could detect whether an added node was
>the target of a move operation and if so where the source of that
>operation was. However, this comes with a performance penalty since such
>a diff operation could not be done in a single pass any more. In order
>to decide whether a deleted node has been moved, the corresponding add
>needs to be found first. In essence this requires the diff operation to
>do two passes: the first one for detecting move operations and the
>second one for the other operations.
>
>To avoid the second pass, we could also remember the paths of the moved
>nodes in a global place (*). This would allow us to look up whether a
>deleted node was moved (opposed to deleted) as we go and detect moved
>nodes as soon as we come across an added node that has a source path
>annotation. As an added benefit this approach allows us to detect
>whether there was a move at all simply by checking whether there are
>entries in this global place. If this is not the case, we could fall
>back to a simpler diff mechanism.
>
>(*) All such annotations would happen as hidden items in transient space
>and would have to be removed again by some hook before persisting.
>
>WDYT, is this worth the trouble?
>
>Michael