You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2013/09/04 12:12:13 UTC

Search performance potential improvements

Hi guys,

I just came back from a short week of excellent vacations, and I spent
the last two days looking at the search operation.

Just before I left, I thought that the entry was cloned for no reason.
The fact is that we do modify the entry before returning it, so we must
modify a copy of the cached entry, otherwise we can have some really
nasty bugs in other parts (the entry is cached, so we can't simply
modify it).

OTOH, I discovered that we do a costly check in the
DefaultSearchEngine.computeResults() method : we check if the search
BaseDN is an alias by looking into the Alias index. Not doing this check
imrpove the performance for around 15%. I'm pretty sure we can avoid
looking into the index, as soon as we already have a cache of the
existing aliases. The pb is that it's just a cache, and if the alias is
not in the cache, we have to fetch the alias index. This become a pb
when we have a lot of aliases, something unlikely to happen.

I will try to think about a better solution.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Re: Search performance potential improvements

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 9/4/13 1:27 PM, Kiran Ayyagari a écrit :
> On Wed, Sep 4, 2013 at 3:42 PM, Emmanuel Lécharny <el...@gmail.com>wrote:
>
>> Hi guys,
>>
>> I just came back from a short week of excellent vacations, and I spent
>> the last two days looking at the search operation.
>>
>> Just before I left, I thought that the entry was cloned for no reason.
>> The fact is that we do modify the entry before returning it, so we must
>> modify a copy of the cached entry, otherwise we can have some really
>> nasty bugs in other parts (the entry is cached, so we can't simply
>> modify it).
>>
>> OTOH, I discovered that we do a costly check in the
>> DefaultSearchEngine.computeResults() method : we check if the search
>> BaseDN is an alias by looking into the Alias index. Not doing this check
>> imrpove the performance for around 15%. I'm pretty sure we can avoid
>> looking into the index, as soon as we already have a cache of the
>> existing aliases. The pb is that it's just a cache, and if the alias is
>> not in the cache, we have to fetch the alias index. This become a pb
>> when we have a lot of aliases, something unlikely to happen.
>>
>> how about setting a limit on the number of aliases that will be held in
> the cache
> and we store all the aliases in this cache, if the alias cache has less
> than this limit we
> don't need to look into the index (i.e we will always have all the aliases
> in memory)
> (here the assumption is that most server installations will have less
> aliases than
> the given threshold)

Yes, this is probably the best solution. We just have to keep a counter
of aliases beside the alias cache.



-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Re: Search performance potential improvements

Posted by Kiran Ayyagari <ka...@apache.org>.
On Wed, Sep 4, 2013 at 3:42 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Hi guys,
>
> I just came back from a short week of excellent vacations, and I spent
> the last two days looking at the search operation.
>
> Just before I left, I thought that the entry was cloned for no reason.
> The fact is that we do modify the entry before returning it, so we must
> modify a copy of the cached entry, otherwise we can have some really
> nasty bugs in other parts (the entry is cached, so we can't simply
> modify it).
>
> OTOH, I discovered that we do a costly check in the
> DefaultSearchEngine.computeResults() method : we check if the search
> BaseDN is an alias by looking into the Alias index. Not doing this check
> imrpove the performance for around 15%. I'm pretty sure we can avoid
> looking into the index, as soon as we already have a cache of the
> existing aliases. The pb is that it's just a cache, and if the alias is
> not in the cache, we have to fetch the alias index. This become a pb
> when we have a lot of aliases, something unlikely to happen.
>
> how about setting a limit on the number of aliases that will be held in
the cache
and we store all the aliases in this cache, if the alias cache has less
than this limit we
don't need to look into the index (i.e we will always have all the aliases
in memory)
(here the assumption is that most server installations will have less
aliases than
the given threshold)

> I will try to think about a better solution.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Kiran Ayyagari
http://keydap.com

Re: Search performance potential improvements

Posted by Kiran Ayyagari <ka...@apache.org>.
On Sat, Sep 7, 2013 at 12:25 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Some more : in SubtreeScopeEvaluator.evaluate() and
> OneLevelScopeEvaluator.evaluate(), we also check if the entry we will
> work on is an alias :
>
>         if ( null != db.getAliasIndex().reverseLookup( id ) )
>         {
>             return false;
>         }
>
> Replacing it with a cache check will again save 10%.
>
> FTR, by commenting this piece of code and the same in
> DefaultSearchEngine, I reach 26 364 search/s on Mavibot (it was around
> 19 000 before).
>
> that is a petty encouraging improvement, thanks for the heads up

> Also note that since I started to work on performances, one month ago,
> the gain is quite significant : up to 26 364 search/s from 5091/s... And
> we still have some other ways to gain somemore performance :-)
>
> JDBM performances are 3 times behind (6 381/s) but it was 654/s on the
> very first tests.
>
>
> Le 9/4/13 12:12 PM, Emmanuel Lécharny a écrit :
> > Hi guys,
> >
> > I just came back from a short week of excellent vacations, and I spent
> > the last two days looking at the search operation.
> >
> > Just before I left, I thought that the entry was cloned for no reason.
> > The fact is that we do modify the entry before returning it, so we must
> > modify a copy of the cached entry, otherwise we can have some really
> > nasty bugs in other parts (the entry is cached, so we can't simply
> > modify it).
> >
> > OTOH, I discovered that we do a costly check in the
> > DefaultSearchEngine.computeResults() method : we check if the search
> > BaseDN is an alias by looking into the Alias index. Not doing this check
> > imrpove the performance for around 15%. I'm pretty sure we can avoid
> > looking into the index, as soon as we already have a cache of the
> > existing aliases. The pb is that it's just a cache, and if the alias is
> > not in the cache, we have to fetch the alias index. This become a pb
> > when we have a lot of aliases, something unlikely to happen.
> >
> > I will try to think about a better solution.
> >
> >
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Kiran Ayyagari
http://keydap.com

Re: Search performance potential improvements

Posted by Emmanuel Lécharny <el...@gmail.com>.
Some more : in SubtreeScopeEvaluator.evaluate() and
OneLevelScopeEvaluator.evaluate(), we also check if the entry we will
work on is an alias :

        if ( null != db.getAliasIndex().reverseLookup( id ) )
        {
            return false;
        }

Replacing it with a cache check will again save 10%.

FTR, by commenting this piece of code and the same in
DefaultSearchEngine, I reach 26 364 search/s on Mavibot (it was around
19 000 before).

Also note that since I started to work on performances, one month ago,
the gain is quite significant : up to 26 364 search/s from 5091/s... And
we still have some other ways to gain somemore performance :-)

JDBM performances are 3 times behind (6 381/s) but it was 654/s on the
very first tests.


Le 9/4/13 12:12 PM, Emmanuel Lécharny a écrit :
> Hi guys,
>
> I just came back from a short week of excellent vacations, and I spent
> the last two days looking at the search operation.
>
> Just before I left, I thought that the entry was cloned for no reason.
> The fact is that we do modify the entry before returning it, so we must
> modify a copy of the cached entry, otherwise we can have some really
> nasty bugs in other parts (the entry is cached, so we can't simply
> modify it).
>
> OTOH, I discovered that we do a costly check in the
> DefaultSearchEngine.computeResults() method : we check if the search
> BaseDN is an alias by looking into the Alias index. Not doing this check
> imrpove the performance for around 15%. I'm pretty sure we can avoid
> looking into the index, as soon as we already have a cache of the
> existing aliases. The pb is that it's just a cache, and if the alias is
> not in the cache, we have to fetch the alias index. This become a pb
> when we have a lot of aliases, something unlikely to happen.
>
> I will try to think about a better solution.
>
>


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com