You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@directory.apache.org by Ernst Bech <Er...@caseris.de> on 2014/03/21 17:18:44 UTC
[ApacheDS] Slow search for a string starting with "*"
I'm using ApacheDS 2.0.0-M16.
I made "cn" an index, imported my own schema and then imported a bunch of
data (4000 entries) based partially on the own schema.
After importing was finished I did a number of searches. The searches for
"(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the
search for "(cn=*foo)" was very slow (seconds).
Is this a known issue or can it be improved?
Re: Re: Antwort: Re: [ApacheDS] Slow search for a string starting
with "*"
Posted by Kiran Ayyagari <ka...@apache.org>.
I have practically just ran into this issue and found that the
DefaultOptimizer is
returning Long.MAX_VALUE for substring searches starting with "*"
Ideally when the given attribute has a index then we should just consider
the size/count
of the index, this results in better performance gain than when the
Long.MAX_VALUE
is used.
I will commit this fix.
otoh, I was able to gain 10x speed by configuring the
ads-partitioncachesize property to
a sufficiently high value w.r.t the total size of the partition
(effectively keeping many entries
in memory to avoid deserialization)
On Mon, Mar 24, 2014 at 11:14 PM, Ernst Bech <Er...@caseris.de> wrote:
> Ok I just did :)
>
> DIRSERVER 1965
>
> I hope I didn't mess up too bad ;)
>
> With regards
>
> Ernst Bech
>
>
>
> Von: Emmanuel Lécharny <el...@gmail.com>
> An: users@directory.apache.org,
> Datum: 24.03.2014 17:45
> Betreff: Re: Antwort: Re: [ApacheDS] Slow search for a string
> starting with "*"
>
>
>
> Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> > I have done a search for an unindexed attribut, another search for an
> > unindexed attribute starting with a "*" and a search for an indexed
> > attribute starting with a "*".
> > All three took abount 5 secs on my hardware.
> >
> > So these test results seem to prove that the index is not used at all as
>
> > you pointed out. Will this have a chance to be improved later?
> > OpenLDAP has the ability to configure an index as substring index which
> > improves also searches starting (not only ending) with a "*"
> drastically.
> > Is this possible which ApacheDS too?
>
> This is defitively possible, and I would say it's easy to implement :
> it's just a matter to create an index of the reverted value.
>
> How does it work ? Let's say you add a value MY-TEST for the CN
> attribute. The normal CN index will refer to TEST, when the reverse
> index will refer to TSET-YM. Now, looking for *TEST is just searching
> the reverted index for TSET*.
>
> We have discussed this feature lengthly for years, but never decided to
> implement it up to now.
>
> It woul be cool to create a JIRA proposing to add this feature.
>
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>
>
--
Kiran Ayyagari
http://keydap.com
Antwort: Re: Antwort: Re: [ApacheDS] Slow search for a string starting with
"*"
Posted by Ernst Bech <Er...@caseris.de>.
Ok I just did :)
DIRSERVER 1965
I hope I didn't mess up too bad ;)
With regards
Ernst Bech
Von: Emmanuel Lécharny <el...@gmail.com>
An: users@directory.apache.org,
Datum: 24.03.2014 17:45
Betreff: Re: Antwort: Re: [ApacheDS] Slow search for a string
starting with "*"
Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> I have done a search for an unindexed attribut, another search for an
> unindexed attribute starting with a "*" and a search for an indexed
> attribute starting with a "*".
> All three took abount 5 secs on my hardware.
>
> So these test results seem to prove that the index is not used at all as
> you pointed out. Will this have a chance to be improved later?
> OpenLDAP has the ability to configure an index as substring index which
> improves also searches starting (not only ending) with a "*"
drastically.
> Is this possible which ApacheDS too?
This is defitively possible, and I would say it's easy to implement :
it's just a matter to create an index of the reverted value.
How does it work ? Let's say you add a value MY-TEST for the CN
attribute. The normal CN index will refer to TEST, when the reverse
index will refer to TSET-YM. Now, looking for *TEST is just searching
the reverted index for TSET*.
We have discussed this feature lengthly for years, but never decided to
implement it up to now.
It woul be cool to create a JIRA proposing to add this feature.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com
Re: Antwort: Re: [ApacheDS] Slow search for a string starting with
"*"
Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 3/24/14 4:26 PM, Ernst Bech a écrit :
> I have done a search for an unindexed attribut, another search for an
> unindexed attribute starting with a "*" and a search for an indexed
> attribute starting with a "*".
> All three took abount 5 secs on my hardware.
>
> So these test results seem to prove that the index is not used at all as
> you pointed out. Will this have a chance to be improved later?
> OpenLDAP has the ability to configure an index as substring index which
> improves also searches starting (not only ending) with a "*" drastically.
> Is this possible which ApacheDS too?
This is defitively possible, and I would say it's easy to implement :
it's just a matter to create an index of the reverted value.
How does it work ? Let's say you add a value MY-TEST for the CN
attribute. The normal CN index will refer to TEST, when the reverse
index will refer to TSET-YM. Now, looking for *TEST is just searching
the reverted index for TSET*.
We have discussed this feature lengthly for years, but never decided to
implement it up to now.
It woul be cool to create a JIRA proposing to add this feature.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com
Antwort: Re: [ApacheDS] Slow search for a string starting with "*"
Posted by Ernst Bech <Er...@caseris.de>.
I have done a search for an unindexed attribut, another search for an
unindexed attribute starting with a "*" and a search for an indexed
attribute starting with a "*".
All three took abount 5 secs on my hardware.
So these test results seem to prove that the index is not used at all as
you pointed out. Will this have a chance to be improved later?
OpenLDAP has the ability to configure an index as substring index which
improves also searches starting (not only ending) with a "*" drastically.
Is this possible which ApacheDS too?
I found a checkin which could be related.
http://svn.apache.org/viewvc?view=revision&revision=1398737
Did this only speed up the search ending with the "*" cause these ones are
fast.
With regards
Ernst Bech
Von: Emmanuel Lécharny <el...@gmail.com>
An: users@directory.apache.org,
Datum: 21.03.2014 17:50
Betreff: Re: [ApacheDS] Slow search for a string starting with "*"
Le 3/21/14 5:18 PM, Ernst Bech a écrit :
> I'm using ApacheDS 2.0.0-M16.
>
> I made "cn" an index, imported my own schema and then imported a bunch
of
> data (4000 entries) based partially on the own schema.
>
> After importing was finished I did a number of searches. The searches
for
> "(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the
> search for "(cn=*foo)" was very slow (seconds).
>
> Is this a known issue or can it be improved?
>
This is a known issue. Also it should not take seconds.
The rational is that substring searches are done by doing a full scan
(ie, each entry is being read, and the cn attribute will be compared to
the filter).
With 4000 entries in your base, it should take less than a second.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com
Re: [ApacheDS] Slow search for a string starting with "*"
Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 3/21/14 5:18 PM, Ernst Bech a écrit :
> I'm using ApacheDS 2.0.0-M16.
>
> I made "cn" an index, imported my own schema and then imported a bunch of
> data (4000 entries) based partially on the own schema.
>
> After importing was finished I did a number of searches. The searches for
> "(cn=foo)" and "(cn=foo*)" were very fast (msecs) as expected but the
> search for "(cn=*foo)" was very slow (seconds).
>
> Is this a known issue or can it be improved?
>
This is a known issue. Also it should not take seconds.
The rational is that substring searches are done by doing a full scan
(ie, each entry is being read, and the cn attribute will be compared to
the filter).
With 4000 entries in your base, it should take less than a second.
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com