You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Ian Boston <ie...@tfd.co.uk> on 2016/09/14 09:40:36 UTC

IndexEditorProvider behaviour question.

Hi,
The behaviour of calls to the IndexEditorProvider appears to be suboptimal.
Has this area been looked at before?

I am working from a complete lack of historical knowledge about the area,
so probably don't know the full picture. Based on logging the calls into
IndexEditorProvider.getIndexEditor(), and reading the
LuceneIndexEditorProvider this is what I have observed.

A. Every commit results in 1 call to IndexEditorProvider.getIndexEditor()
per index definition. (perhaps 100 in a full system).
B. Each IndexEditor then gets called building a tree of IndexEditors which
work out changes to update the their index.
C. IndexEditors sometimes filter subtrees. based on the index definition,
but this seems to the the exception rather than the rule.
D. Index Editor Providers produce a subtree based on type (ie a property
index definition doesn't generate a IndexEditor for lucene indexes and visa
versa).

A and B mean that the work of creating the tree and working out the changes
in a tree will be duplicated roughly n times, where n is the number of
index definitions. (D means its not n*p where p is the number of
IndexEditorProviders). I haven't looked at how much C reduces the cost in
reality.

Has anyone looked at building the tree once, and passing the fully built
tree to indexers?

Even if the computational effort is not great the number of objects being
created and passing through GC seems higher than it needs to be.

As I said, I have no historical knowledge so if doing this doesn't improve
things and why is recorded just say (ideally with a pointer) so I can read
and understand more.

Best Regards
Ian

Re: IndexEditorProvider behaviour question.

Posted by Ian Boston <ie...@tfd.co.uk>.
Hi,
Thanks for looking at this, sounds like you are on the case already.
if I see anything else I'll let you know.
Best Regards
Ian


On 15 September 2016 at 05:33, Chetan Mehrotra <ch...@gmail.com>
wrote:

> Note that so far LuceneIndexEditor was used only for async indexing
> case and hence invoked only on leader node every 5 sec. So performance
> aspects here were not that critical. However with recent work on
> Hybrid indexes they would be used in critical path and hence such
> aspects are important
>
> On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> > A and B mean that the work of creating the tree and working out the
> changes
> > in a tree will be duplicated roughly n times, where n is the number of
> > index definitions.
>
> Here note that diff would be performed only once at any level and
> IndexUpdate would then pass them to various editors. However
> construction of trees can be avoided and I have opened OAK-4806 for
> that now. Oak issue has details around why Tree was used also.
>
> Also with multiple index editors performance does decrease. See
> OAK-1273. If we switch to Hybrid Index then this aspects improves a
> bit as instead of having 50 different property indexes (with 50 editor
> instance for each commit) we can have a single editor with 50 property
> definition. This can be seen in benchmark in Hybrid Index (OAk-4412)
> by changing the numOfIndexes
>
> If you see any other area of improvement say around unnecessary object
> generation then let us know!
>
> Chetan Mehrotra
>

Re: IndexEditorProvider behaviour question.

Posted by Chetan Mehrotra <ch...@gmail.com>.
Note that so far LuceneIndexEditor was used only for async indexing
case and hence invoked only on leader node every 5 sec. So performance
aspects here were not that critical. However with recent work on
Hybrid indexes they would be used in critical path and hence such
aspects are important

On Wed, Sep 14, 2016 at 3:10 PM, Ian Boston <ie...@tfd.co.uk> wrote:
> A and B mean that the work of creating the tree and working out the changes
> in a tree will be duplicated roughly n times, where n is the number of
> index definitions.

Here note that diff would be performed only once at any level and
IndexUpdate would then pass them to various editors. However
construction of trees can be avoided and I have opened OAK-4806 for
that now. Oak issue has details around why Tree was used also.

Also with multiple index editors performance does decrease. See
OAK-1273. If we switch to Hybrid Index then this aspects improves a
bit as instead of having 50 different property indexes (with 50 editor
instance for each commit) we can have a single editor with 50 property
definition. This can be seen in benchmark in Hybrid Index (OAk-4412)
by changing the numOfIndexes

If you see any other area of improvement say around unnecessary object
generation then let us know!

Chetan Mehrotra