You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@directory.apache.org by Ernst Bech <Er...@caseris.de> on 2014/03/21 17:18:44 UTC

[ApacheDS] Slow search for a string starting with "*"

I'm using ApacheDS 2.0.0-M16.

I made "cn" an index, imported my own schema and then imported a bunch of 
data (4000 entries) based partially on the own schema.

After importing was finished I did a number of searches. The searches for 
"(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the 
search for "(cn=*foo)" was very slow (seconds).

Is this a known issue or can it be improved?

Re: Re: Antwort: Re: [ApacheDS] Slow search for a string starting with "*"

Posted by Kiran Ayyagari <ka...@apache.org>.
I have practically just ran into this issue and found that the
DefaultOptimizer is
returning Long.MAX_VALUE for substring searches starting with "*"

Ideally when the given attribute has a index then we should just consider
the size/count
of the index, this results in better performance gain than when the
Long.MAX_VALUE
is used.

I will commit this fix.

otoh, I was able to gain 10x speed by configuring the
ads-partitioncachesize property to
a sufficiently high value w.r.t the total size of the partition
(effectively keeping many entries
in memory to avoid deserialization)

On Mon, Mar 24, 2014 at 11:14 PM, Ernst Bech <Er...@caseris.de> wrote:

> Ok I just did :)
>
> DIRSERVER 1965
>
> I hope I didn't mess up too bad ;)
>
> With regards
>
> Ernst Bech
>
>
>
> Von:    Emmanuel Lécharny <el...@gmail.com>
> An:     users@directory.apache.org,
> Datum:  24.03.2014 17:45
> Betreff:        Re: Antwort: Re: [ApacheDS] Slow search for a string
> starting with "*"
>
>
>
> Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> > I have done a search for an unindexed attribut, another search for an
> > unindexed attribute starting with a "*" and a search for an indexed
> > attribute starting with a "*".
> > All three took abount 5 secs on my hardware.
> >
> > So these test results seem to prove that the index is not used at all as
>
> > you pointed out. Will this have a chance to be improved later?
> > OpenLDAP has the ability to configure an index as substring index which
> > improves also searches starting (not only ending) with a "*"
> drastically.
> > Is this possible which ApacheDS too?
>
> This is defitively possible, and I would say it's easy to implement :
> it's just a matter to create an index of the reverted value.
>
> How does it work ? Let's say you add a value MY-TEST for the CN
> attribute. The normal CN index will refer to TEST, when the reverse
> index will refer to TSET-YM. Now, looking for *TEST is just searching
> the reverted index for TSET*.
>
> We have discussed this feature lengthly for years, but never decided to
> implement it up to now.
>
> It woul be cool to create a JIRA proposing to add this feature.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
>


-- 
Kiran Ayyagari
http://keydap.com

Antwort: Re: Antwort: Re: [ApacheDS] Slow search for a string starting with "*"

Posted by Ernst Bech <Er...@caseris.de>.
Ok I just did :)

DIRSERVER 1965

I hope I didn't mess up too bad ;)

With regards

Ernst Bech



Von:    Emmanuel Lécharny <el...@gmail.com>
An:     users@directory.apache.org, 
Datum:  24.03.2014 17:45
Betreff:        Re: Antwort: Re: [ApacheDS] Slow search for a string 
starting with "*"



Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> I have done a search for an unindexed attribut, another search for an 
> unindexed attribute starting with a "*" and a search for an indexed 
> attribute starting with a "*".
> All three took abount 5 secs on my hardware.
>
> So these test results seem to prove that the index is not used at all as 

> you pointed out. Will this have a chance to be improved later?
> OpenLDAP has the ability to configure an index as substring index which 
> improves also searches starting (not only ending) with a "*" 
drastically.
> Is this possible which ApacheDS too?

This is defitively possible, and I would say it's easy to implement :
it's just a matter to create an index of the reverted value.

How does it work ? Let's say you add a value MY-TEST for the CN
attribute. The normal CN index will refer to TEST, when the reverse
index will refer to TSET-YM. Now, looking for *TEST is just searching
the reverted index for TSET*.

We have discussed this feature lengthly for years, but never decided to
implement it up to now.

It woul be cool to create a JIRA proposing to add this feature.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 



Re: Antwort: Re: [ApacheDS] Slow search for a string starting with "*"

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> I have done a search for an unindexed attribut, another search for an 
> unindexed attribute starting with a "*" and a search for an indexed 
> attribute starting with a "*".
> All three took abount 5 secs on my hardware.
>
> So these test results seem to prove that the index is not used at all as 
> you pointed out. Will this have a chance to be improved later?
> OpenLDAP has the ability to configure an index as substring index which 
> improves also searches starting (not only ending) with a "*" drastically.
> Is this possible which ApacheDS too?

This is defitively possible, and I would say it's easy to implement :
it's just a matter to create an index of the reverted value.

How does it work ? Let's say you add a value MY-TEST for the CN
attribute. The normal CN index will refer to TEST, when the reverse
index will refer to TSET-YM. Now, looking for *TEST is just searching
the reverted index for TSET*.

We have discussed this feature lengthly for years, but never decided to
implement it up to now.

It woul be cool to create a JIRA proposing to add this feature.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Antwort: Re: [ApacheDS] Slow search for a string starting with "*"

Posted by Ernst Bech <Er...@caseris.de>.
I have done a search for an unindexed attribut, another search for an 
unindexed attribute starting with a "*" and a search for an indexed 
attribute starting with a "*".
All three took abount 5 secs on my hardware.

So these test results seem to prove that the index is not used at all as 
you pointed out. Will this have a chance to be improved later?
OpenLDAP has the ability to configure an index as substring index which 
improves also searches starting (not only ending) with a "*" drastically.
Is this possible which ApacheDS too?

I found a checkin which could be related. 
http://svn.apache.org/viewvc?view=revision&revision=1398737
Did this only speed up the search ending with the "*" cause these ones are 
fast.

With regards

Ernst Bech



Von:    Emmanuel Lécharny <el...@gmail.com>
An:     users@directory.apache.org, 
Datum:  21.03.2014 17:50
Betreff:        Re: [ApacheDS] Slow search for a string starting with "*"



Le 3/21/14 5:18 PM, Ernst Bech a écrit :
> I'm using ApacheDS 2.0.0-M16.
>
> I made "cn" an index, imported my own schema and then imported a bunch 
of 
> data (4000 entries) based partially on the own schema.
>
> After importing was finished I did a number of searches. The searches 
for 
> "(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the 
> search for "(cn=*foo)" was very slow (seconds).
>
> Is this a known issue or can it be improved?
>
This is a known issue. Also it should not take seconds.

The rational is that substring searches are done by doing a full scan
(ie, each entry is being read, and the cn attribute will be compared to
the filter).

With 4000 entries in your base, it should take less than a second.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 



Re: [ApacheDS] Slow search for a string starting with "*"

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 3/21/14 5:18 PM, Ernst Bech a écrit :
> I'm using ApacheDS 2.0.0-M16.
>
> I made "cn" an index, imported my own schema and then imported a bunch of 
> data (4000 entries) based partially on the own schema.
>
> After importing was finished I did a number of searches. The searches for 
> "(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the 
> search for "(cn=*foo)" was very slow (seconds).
>
> Is this a known issue or can it be improved?
>
This is a known issue. Also it should not take seconds.

The rational is that substring searches are done by doing a full scan
(ie, each entry is being read, and the cn attribute will be compared to
the filter).

With 4000 entries in your base, it should take less than a second.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com