You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2013/10/17 14:16:43 UTC
Detecting move operations in node state diffs
Hi,
Currently we can't detect a move operation through diffing node states.
Those operation are currently seen as separate remove and add operations
that can't be easily associated with each other. This impacts permission
evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
both don't have the same support for moves as had Jackrabbit 2.
As discussed several times before it is not possible to regain move
operation from simply diffing node states. We need additional
information. One option is to annotate nodes (*) as they are moved with
their source path. With that we could detect whether an added node was
the target of a move operation and if so where the source of that
operation was. However, this comes with a performance penalty since such
a diff operation could not be done in a single pass any more. In order
to decide whether a deleted node has been moved, the corresponding add
needs to be found first. In essence this requires the diff operation to
do two passes: the first one for detecting move operations and the
second one for the other operations.
To avoid the second pass, we could also remember the paths of the moved
nodes in a global place (*). This would allow us to look up whether a
deleted node was moved (opposed to deleted) as we go and detect moved
nodes as soon as we come across an added node that has a source path
annotation. As an added benefit this approach allows us to detect
whether there was a move at all simply by checking whether there are
entries in this global place. If this is not the case, we could fall
back to a simpler diff mechanism.
(*) All such annotations would happen as hidden items in transient space
and would have to be removed again by some hook before persisting.
WDYT, is this worth the trouble?
Michael
Re: Detecting move operations in node state diffs
Posted by Tommaso Teofili <to...@gmail.com>.
>From the user point of view I think this makes sense and it'd be worth the
effort, also it looks like the global space approach would allow to do that
without loosing too much in terms of performance.
My 2 cents,
Tommaso
2013/10/17 Michael Dürig <md...@apache.org>
>
> Hi,
>
> Currently we can't detect a move operation through diffing node states.
> Those operation are currently seen as separate remove and add operations
> that can't be easily associated with each other. This impacts permission
> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
> both don't have the same support for moves as had Jackrabbit 2.
>
> As discussed several times before it is not possible to regain move
> operation from simply diffing node states. We need additional information.
> One option is to annotate nodes (*) as they are moved with their source
> path. With that we could detect whether an added node was the target of a
> move operation and if so where the source of that operation was. However,
> this comes with a performance penalty since such a diff operation could not
> be done in a single pass any more. In order to decide whether a deleted
> node has been moved, the corresponding add needs to be found first. In
> essence this requires the diff operation to do two passes: the first one
> for detecting move operations and the second one for the other operations.
>
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*). This would allow us to look up whether a
> deleted node was moved (opposed to deleted) as we go and detect moved nodes
> as soon as we come across an added node that has a source path annotation.
> As an added benefit this approach allows us to detect whether there was a
> move at all simply by checking whether there are entries in this global
> place. If this is not the case, we could fall back to a simpler diff
> mechanism.
>
> (*) All such annotations would happen as hidden items in transient space
> and would have to be removed again by some hook before persisting.
>
> WDYT, is this worth the trouble?
>
> Michael
>
Re: Detecting move operations in node state diffs
Posted by Jukka Zitting <ju...@gmail.com>.
Hi,
On Thu, Oct 17, 2013 at 8:16 AM, Michael Dürig <md...@apache.org> wrote:
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*).
Perhaps the new CommitInfo?
I assume we'd only need to track moves for local commits as that's
where permission evaluation is done, and as our support for observing
external events in any case is limited.
BR,
Jukka Zitting
Re: Detecting move operations in node state diffs
Posted by Michael Dürig <md...@apache.org>.
On 21.10.13 8:23 , Michael Dürig wrote:
>> IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
>> and node moved. but maybe I'm wrong, but I thought we kept this for
>> backward compatibility.
>
> Thanks for the heads up. Will double check. This would in fact simplify
> event generation in the observation code.
Double checked and yes, this is the case: Jackrabbit sends 3 events per
move: NODE_ADDED, NODE_REMOVED and NODE_MOVED.
Michael
Re: Detecting move operations in node state diffs
Posted by Michael Dürig <md...@apache.org>.
On 21.10.13 7:18 , Tobias Bocanegra wrote:
> On Mon, Oct 21, 2013 at 7:31 AM, Thomas Mueller<mu...@adobe.com> wrote:
>> >Hi,
>> >
>>> >>extra pass
>> >
>> >On how to avoid this extra pass. Not strictly backward compatible, but I
>> >wonder how much it would break: what if observation would deliver two
>> >events for moved nodes: the "node moved" event (added at the target), plus
>> >the "node deleted" event (deleted at the source)? The one use case I know
>> >about, data store garbage collection in Jackrabbit core, would be OK with
>> >this behavior.
> IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
> and node moved. but maybe I'm wrong, but I thought we kept this for
> backward compatibility.
Thanks for the heads up. Will double check. This would in fact simplify
event generation in the observation code.
Michael
Re: Detecting move operations in node state diffs
Posted by Thomas Mueller <mu...@adobe.com>.
Hi,
>IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
>and node moved.
Cool, I didn't know this :-)
Regards,
Thomas
Re: Detecting move operations in node state diffs
Posted by Tobias Bocanegra <tr...@apache.org>.
On Mon, Oct 21, 2013 at 7:31 AM, Thomas Mueller <mu...@adobe.com> wrote:
> Hi,
>
>> extra pass
>
> On how to avoid this extra pass. Not strictly backward compatible, but I
> wonder how much it would break: what if observation would deliver two
> events for moved nodes: the "node moved" event (added at the target), plus
> the "node deleted" event (deleted at the source)? The one use case I know
> about, data store garbage collection in Jackrabbit core, would be OK with
> this behavior.
IIRC, in JR2 a moved node triggers 3 events: node added, node deleted
and node moved. but maybe I'm wrong, but I thought we kept this for
backward compatibility.
regards, toby
>
> Regards,
> Thomas
>
>
>
> On 10/21/13 2:20 PM, "Michael Dürig" <md...@apache.org> wrote:
>
>>
>>Hi,
>>
>>I implemented a very rough POC of the algorithm outlined below. See [1]
>>for the implementation itself. On move a node is annotated with its
>>source path in NodeBuilder.moveTo(). Later moves can be extracted
>>through the standalone MoveDetector class. See MoveDetectorTest for
>>details. MoveDetector also provides a static utility method
>>findMovedPaths for building the set of moved nodes the algorithm
>>requires. As mentioned below this extra pass is not required if this set
>>can be obtained by other means.
>>
>>See [2] how this could be integrated with the current observation
>>implementation.
>>
>>If we deicide to go with such an approach at all, we still need to
>>figure out how to better integrate it with the current node state diff.
>>
>>Michael
>>
>>
>>[1]
>>https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b
>>18038eec11f315a
>>[2]
>>https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e
>>44c27ad34292b88
>>
>>
>>On 17.10.13 2:16 , Michael Dürig wrote:
>>>
>>> Hi,
>>>
>>> Currently we can't detect a move operation through diffing node states.
>>> Those operation are currently seen as separate remove and add operations
>>> that can't be easily associated with each other. This impacts permission
>>> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>>> both don't have the same support for moves as had Jackrabbit 2.
>>>
>>> As discussed several times before it is not possible to regain move
>>> operation from simply diffing node states. We need additional
>>> information. One option is to annotate nodes (*) as they are moved with
>>> their source path. With that we could detect whether an added node was
>>> the target of a move operation and if so where the source of that
>>> operation was. However, this comes with a performance penalty since such
>>> a diff operation could not be done in a single pass any more. In order
>>> to decide whether a deleted node has been moved, the corresponding add
>>> needs to be found first. In essence this requires the diff operation to
>>> do two passes: the first one for detecting move operations and the
>>> second one for the other operations.
>>>
>>> To avoid the second pass, we could also remember the paths of the moved
>>> nodes in a global place (*). This would allow us to look up whether a
>>> deleted node was moved (opposed to deleted) as we go and detect moved
>>> nodes as soon as we come across an added node that has a source path
>>> annotation. As an added benefit this approach allows us to detect
>>> whether there was a move at all simply by checking whether there are
>>> entries in this global place. If this is not the case, we could fall
>>> back to a simpler diff mechanism.
>>>
>>> (*) All such annotations would happen as hidden items in transient space
>>> and would have to be removed again by some hook before persisting.
>>>
>>> WDYT, is this worth the trouble?
>>>
>>> Michael
>
Re: Detecting move operations in node state diffs
Posted by Thomas Mueller <mu...@adobe.com>.
Hi,
> extra pass
On how to avoid this extra pass. Not strictly backward compatible, but I
wonder how much it would break: what if observation would deliver two
events for moved nodes: the "node moved" event (added at the target), plus
the "node deleted" event (deleted at the source)? The one use case I know
about, data store garbage collection in Jackrabbit core, would be OK with
this behavior.
Regards,
Thomas
On 10/21/13 2:20 PM, "Michael Dürig" <md...@apache.org> wrote:
>
>Hi,
>
>I implemented a very rough POC of the algorithm outlined below. See [1]
>for the implementation itself. On move a node is annotated with its
>source path in NodeBuilder.moveTo(). Later moves can be extracted
>through the standalone MoveDetector class. See MoveDetectorTest for
>details. MoveDetector also provides a static utility method
>findMovedPaths for building the set of moved nodes the algorithm
>requires. As mentioned below this extra pass is not required if this set
>can be obtained by other means.
>
>See [2] how this could be integrated with the current observation
>implementation.
>
>If we deicide to go with such an approach at all, we still need to
>figure out how to better integrate it with the current node state diff.
>
>Michael
>
>
>[1]
>https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b
>18038eec11f315a
>[2]
>https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e
>44c27ad34292b88
>
>
>On 17.10.13 2:16 , Michael Dürig wrote:
>>
>> Hi,
>>
>> Currently we can't detect a move operation through diffing node states.
>> Those operation are currently seen as separate remove and add operations
>> that can't be easily associated with each other. This impacts permission
>> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>> both don't have the same support for moves as had Jackrabbit 2.
>>
>> As discussed several times before it is not possible to regain move
>> operation from simply diffing node states. We need additional
>> information. One option is to annotate nodes (*) as they are moved with
>> their source path. With that we could detect whether an added node was
>> the target of a move operation and if so where the source of that
>> operation was. However, this comes with a performance penalty since such
>> a diff operation could not be done in a single pass any more. In order
>> to decide whether a deleted node has been moved, the corresponding add
>> needs to be found first. In essence this requires the diff operation to
>> do two passes: the first one for detecting move operations and the
>> second one for the other operations.
>>
>> To avoid the second pass, we could also remember the paths of the moved
>> nodes in a global place (*). This would allow us to look up whether a
>> deleted node was moved (opposed to deleted) as we go and detect moved
>> nodes as soon as we come across an added node that has a source path
>> annotation. As an added benefit this approach allows us to detect
>> whether there was a move at all simply by checking whether there are
>> entries in this global place. If this is not the case, we could fall
>> back to a simpler diff mechanism.
>>
>> (*) All such annotations would happen as hidden items in transient space
>> and would have to be removed again by some hook before persisting.
>>
>> WDYT, is this worth the trouble?
>>
>> Michael
Re: Detecting move operations in node state diffs
Posted by Michael Dürig <md...@apache.org>.
Hi,
I implemented a very rough POC of the algorithm outlined below. See [1]
for the implementation itself. On move a node is annotated with its
source path in NodeBuilder.moveTo(). Later moves can be extracted
through the standalone MoveDetector class. See MoveDetectorTest for
details. MoveDetector also provides a static utility method
findMovedPaths for building the set of moved nodes the algorithm
requires. As mentioned below this extra pass is not required if this set
can be obtained by other means.
See [2] how this could be integrated with the current observation
implementation.
If we deicide to go with such an approach at all, we still need to
figure out how to better integrate it with the current node state diff.
Michael
[1]
https://github.com/mduerig/jackrabbit-oak/commit/a74ea2095d5a3aea2e27dbc0b18038eec11f315a
[2]
https://github.com/mduerig/jackrabbit-oak/commit/ad81af03f9c8c8ab11acd614e44c27ad34292b88
On 17.10.13 2:16 , Michael Dürig wrote:
>
> Hi,
>
> Currently we can't detect a move operation through diffing node states.
> Those operation are currently seen as separate remove and add operations
> that can't be easily associated with each other. This impacts permission
> evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
> both don't have the same support for moves as had Jackrabbit 2.
>
> As discussed several times before it is not possible to regain move
> operation from simply diffing node states. We need additional
> information. One option is to annotate nodes (*) as they are moved with
> their source path. With that we could detect whether an added node was
> the target of a move operation and if so where the source of that
> operation was. However, this comes with a performance penalty since such
> a diff operation could not be done in a single pass any more. In order
> to decide whether a deleted node has been moved, the corresponding add
> needs to be found first. In essence this requires the diff operation to
> do two passes: the first one for detecting move operations and the
> second one for the other operations.
>
> To avoid the second pass, we could also remember the paths of the moved
> nodes in a global place (*). This would allow us to look up whether a
> deleted node was moved (opposed to deleted) as we go and detect moved
> nodes as soon as we come across an added node that has a source path
> annotation. As an added benefit this approach allows us to detect
> whether there was a move at all simply by checking whether there are
> entries in this global place. If this is not the case, we could fall
> back to a simpler diff mechanism.
>
> (*) All such annotations would happen as hidden items in transient space
> and would have to be removed again by some hook before persisting.
>
> WDYT, is this worth the trouble?
>
> Michael
Re: Detecting move operations in node state diffs
Posted by Angela Schreiber <an...@adobe.com>.
IMHO it's worth trying to get a solution for this.
if we do, i would suggest that we detect the move before
passing the information down to the commit hooks. in other
words, there should be a method like nodeMoved in order to prevent
that every move-sensitive hook has to write the same code in order
to find out about potential moves.
kind regards
angela
On 10/17/13 2:16 PM, "Michael Dürig" <md...@apache.org> wrote:
>
>Hi,
>
>Currently we can't detect a move operation through diffing node states.
>Those operation are currently seen as separate remove and add operations
>that can't be easily associated with each other. This impacts permission
>evaluation (OAK-710, OAK-783) and observation (OAK-144, OAK-1090), which
>both don't have the same support for moves as had Jackrabbit 2.
>
>As discussed several times before it is not possible to regain move
>operation from simply diffing node states. We need additional
>information. One option is to annotate nodes (*) as they are moved with
>their source path. With that we could detect whether an added node was
>the target of a move operation and if so where the source of that
>operation was. However, this comes with a performance penalty since such
>a diff operation could not be done in a single pass any more. In order
>to decide whether a deleted node has been moved, the corresponding add
>needs to be found first. In essence this requires the diff operation to
>do two passes: the first one for detecting move operations and the
>second one for the other operations.
>
>To avoid the second pass, we could also remember the paths of the moved
>nodes in a global place (*). This would allow us to look up whether a
>deleted node was moved (opposed to deleted) as we go and detect moved
>nodes as soon as we come across an added node that has a source path
>annotation. As an added benefit this approach allows us to detect
>whether there was a move at all simply by checking whether there are
>entries in this global place. If this is not the case, we could fall
>back to a simpler diff mechanism.
>
>(*) All such annotations would happen as hidden items in transient space
>and would have to be removed again by some hook before persisting.
>
>WDYT, is this worth the trouble?
>
>Michael