You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Vikas Saurabh <vi...@gmail.com> on 2015/06/26 05:04:10 UTC

Property index merge conflicts

Hi,

We quite often see that some merge conflicts occur while
creating/pruning hierarchy for content under Property Index hidden
tree. We have an issue to reduce the effects vide OAK-2673 [0] --
while enabling it is prone to OAK-2929 [1], but even if it worked
correctly, it'd just resolve add-add and delete-delete conflicts...
leaving out delete-changed type conflicts.

I was wondering if we can do asynchronous pruning instead. While I
understand why pruning is useful, but having a few extra paths
temporarily probably shouldn't be that bad.

I'm not sure of how AsyncIndexer, LuceneIndexer type stuff works --
but, I'm assuming they'd be keeping some sort of bookmark to note
which revision has already been processed. I guess we can do something
similar here too.

BTW, do these indexers process independent of each other -- would it
make sense to chain such jobs so that each of these can work with just
one calculation of diff?

There's another idea: does it make sense for a document to assert that
it's semantically an 'intermediate' document -- created just to form a
hierarchy, hence conflicts related to such documents can be handled
accordingly. For OAK-2673, we had a heuristic for this -- the
conflicts were resolved for a document which lied under hidden tree
and had no visible properties. May be, we can even have a mixin for
this -- as the hierarchy intent could be very useful even for
applications (I've seen lot of automated tests that need to pre-create
hierarchy just to avoid such a conflict... ).

Thoughts?

Thanks,
Vikas

[0]: https://issues.apache.org/jira/browse/OAK-2673
[1]: https://issues.apache.org/jira/browse/OAK-2929

Re: Property index merge conflicts

Posted by Vikas Saurabh <vi...@gmail.com>.

>> I'm not sure of how AsyncIndexer, LuceneIndexer type stuff works --
>> but, I'm assuming they'd be keeping some sort of bookmark to note
>> which revision has already been processed. I guess we can do something
>> similar here too.
>>
>>
> async updates happen in a single background thread, there will be no
> concurrent update conflicts because of this.

Oh, I meant that maybe we asynchronously prune property index when
async index processing is going on


>> BTW, do these indexers process independent of each other -- would it
>> make sense to chain such jobs so that each of these can work with just
>> one calculation of diff?
>>
>>
> I'm pretty sure that's already the case, I believe issues start when you
> have concurrent writes over the same index content from different threads.

Yes, that was a side-note about if we it's not the same diff over
which AsyncIndex and LuceneIndex work, then we can optimize that a
bit. For the current subject, we can probably plug-in property index
pruning here.


>> There's another idea: does it make sense for a document to assert that
>> it's semantically an 'intermediate' document -- created just to form a
>> hierarchy, hence conflicts related to such documents can be handled
>> accordingly. For OAK-2673, we had a heuristic for this -- the
>> conflicts were resolved for a document which lied under hidden tree
>> and had no visible properties. May be, we can even have a mixin for
>> this -- as the hierarchy intent could be very useful even for
>> applications (I've seen lot of automated tests that need to pre-create
>> hierarchy just to avoid such a conflict... ).
>>
>>
> I didn't follow closely this topic, but if it helps in any way, the
> property index storage (normal properties, not unique ones) already marks
> the leaves with a special flag (match=true), any other intermediary path
> doesn't contain this info. So for the index scenario I think you could come
> up with a way to merge conflicts by choosing a safe route of not deleting
> the intermediary paths. this combined with a more lenient purge strategy
> should reduce the pain.
While working on OAK-2673, we didn't want to tie conflict resolution
to the specific content storage for property indices -- which is why,
we instead chose that such conflict (add-add, delete-delete) would be
resolved only for hidden documents with no JCR properties. That was
essentially a hacky heuristic that could work for Property indexes and
yet not tie too strongly with it. Otoh, I think if the document itself
could declare (maybe as a hidden oak specific property to begin
with... and then promote it to some sort of mixin) the intent, then
that decision of dealing with such a conflict can become more firm and
un-hacky.

BTW, doing something on the lines of OAK-2673 (resolve conflicts,
instead of avoiding it) is a little intrusive and risky (e.g. as we
found that OAK-2673 leads to repository corruption mentioned in
OAK-2929). Otoh, delayed pruning feels much safer and non-intrusive
(except, of course, that the cost for unpruned index might be inflated
a little bit until it gets pruned). So, does it make sense to open an
issue for async pruning. We can probably open a different issue which
does lenient conflict resolution if document declares it separately.

Thanks,
Vikas

Re: Property index merge conflicts

Posted by Alex Parvulescu <al...@gmail.com>.

Hi,

see inline

best,
alex

On Fri, Jun 26, 2015 at 5:04 AM, Vikas Saurabh <vi...@gmail.com>
wrote:

> Hi,
>
> We quite often see that some merge conflicts occur while
> creating/pruning hierarchy for content under Property Index hidden
> tree. We have an issue to reduce the effects vide OAK-2673 [0] --
> while enabling it is prone to OAK-2929 [1], but even if it worked
> correctly, it'd just resolve add-add and delete-delete conflicts...
> leaving out delete-changed type conflicts.
>
> I was wondering if we can do asynchronous pruning instead. While I
> understand why pruning is useful, but having a few extra paths
> temporarily probably shouldn't be that bad.
>
>
what about throttling the pruning via some sort of a setting? now it's 1:1,
we could make this a looser constraint and only purge once in a while.



> I'm not sure of how AsyncIndexer, LuceneIndexer type stuff works --
> but, I'm assuming they'd be keeping some sort of bookmark to note
> which revision has already been processed. I guess we can do something
> similar here too.
>
>
async updates happen in a single background thread, there will be no
concurrent update conflicts because of this.



> BTW, do these indexers process independent of each other -- would it
> make sense to chain such jobs so that each of these can work with just
> one calculation of diff?
>
>
I'm pretty sure that's already the case, I believe issues start when you
have concurrent writes over the same index content from different threads.




> There's another idea: does it make sense for a document to assert that
> it's semantically an 'intermediate' document -- created just to form a
> hierarchy, hence conflicts related to such documents can be handled
> accordingly. For OAK-2673, we had a heuristic for this -- the
> conflicts were resolved for a document which lied under hidden tree
> and had no visible properties. May be, we can even have a mixin for
> this -- as the hierarchy intent could be very useful even for
> applications (I've seen lot of automated tests that need to pre-create
> hierarchy just to avoid such a conflict... ).
>
>
I didn't follow closely this topic, but if it helps in any way, the
property index storage (normal properties, not unique ones) already marks
the leaves with a special flag (match=true), any other intermediary path
doesn't contain this info. So for the index scenario I think you could come
up with a way to merge conflicts by choosing a safe route of not deleting
the intermediary paths. this combined with a more lenient purge strategy
should reduce the pain.



> Thoughts?
>
> Thanks,
> Vikas
>
> [0]: https://issues.apache.org/jira/browse/OAK-2673
> [1]: https://issues.apache.org/jira/browse/OAK-2929
>