You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Alex Karasulu <ao...@bellsouth.net> on 2006/09/02 08:05:16 UTC

[ApacheDS] [Performance] Using indices to boost search performance

Hi all,

I've been testing the search performance boost gained from indexing
attributes before starting development on an optimization to improve
index performance and in memory size.  I thought I'd share these
dramatic results of my pre-optimization tests with you since they
clearly show the benefits of indices.

Before I do this let me list the characteristics of the hardware used
and my configuration settings:


-------------
Machine Setup
-------------

CPU: Dual Athlon MP 1900
OS: Linux 2.6.15
Mem: 2GB 266Mhz


--------------
ApacheDS Setup
--------------

ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
   - Using 1024MB Memory
   - Indexed st and initials


----------
Data Setup
----------

Wrote a simple tool to generate random values for descent sized entries.
  The data sort of looks like this for a user entry:

	dn: uid=user.1,ou=users,dc=example,dc=com
	uid: user.1
	initials: yq
	description: cFocJATNuhlXisDCqGtY
	pager: FYyimqyZRW
	cn: HSGMzajYKmicUTe
	postalcode: WiXXA
	st: xy	
	street: kpCCqmrsCzkpdtHXWMfY
	l: KqmAXFYTrI
	objectclass: person
	objectclass: organizationalPerson
	objectclass: inetOrgPerson
	sn: nuymgOwpm
	homephone: PERamkCtsv
	mobile: vkIviOGNTC
	telephonenumber: 7248889026
	mail: pYvEoOjSnEymcWD
	givenname: IVHJZB
	postaladdress: crObexKoUTIFdzNHcZMr
	employeenumber: 1
	userpassword:: cGFzc3dvcmQ=

I started loading a partition up with these entries 100,000 of them at a
time then performing the following searches for all entries with
initials aa:

(1) index on initials but no cached entries
(2) index on initials with cached entries
(3) no index without cached entries

Here are the results at the various capacities:

---------------
100,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        3.30
(2)    yes      yes        0.72
(3)    no       no         30.63

search results = 153 entries


---------------
200,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        6.04
(2)    yes      yes        1.44
(3)    no       no         82

search results = 302 entries


---------------
300,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        7.54
(2)    yes      yes        1.95
(3)    no       no         146

search results = 451 entries


---------------
400,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        9.24
(2)    yes      yes        3.80
(3)    no       no         196

search results = 586 entries


---------------
500,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        11.96
(2)    yes      yes        3.21
(3)    no       no         224

search results = 748 entries


Alex



Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Emmanuel Lecharny <el...@gmail.com>.
Quanah Gibson-Mount a écrit :

>
>
> --On Saturday, September 02, 2006 2:05 AM -0400 Alex Karasulu 
> <ao...@bellsouth.net> wrote:
>
>> Hi all,
>> ----------
>> Data Setup
>> ----------
>>
>> Wrote a simple tool to generate random values for descent sized entries.
>>   The data sort of looks like this for a user entry:
>
>
> slamd has a really nice tool called "MakeLDIF" that'll do this for 
> you, too.  Plus then how you made your LDIF is publishable to others 
> who may also want to run the same benchmark, since all you have to do 
> is provide the template for MakeLDIF to use. ;)

Yeah, we know that. We have used it many times, and the file you see is 
directly generated by MakeLdif, but with just a very slight difference, 
because we wanted to test a very specific performance issue.

Anyway, thanks for the pointer, because I'm sure that many people 
weren't aware of the existence of this tool :)

Emmanuel.


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Quanah Gibson-Mount <qu...@stanford.edu>.

--On Saturday, September 02, 2006 2:05 AM -0400 Alex Karasulu 
<ao...@bellsouth.net> wrote:

> Hi all,
> ----------
> Data Setup
> ----------
>
> Wrote a simple tool to generate random values for descent sized entries.
>   The data sort of looks like this for a user entry:

slamd has a really nice tool called "MakeLDIF" that'll do this for you, 
too.  Plus then how you made your LDIF is publishable to others who may 
also want to run the same benchmark, since all you have to do is provide 
the template for MakeLDIF to use. ;)

--Quanah


--
Quanah Gibson-Mount
Principal Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Ajay Upadhyaya <aj...@gmail.com>.
Thanks Alex. I'd also look forward to the enh related to reindexing feature.

-Ajay

On 9/3/06, Alex Karasulu <ao...@bellsouth.net> wrote:
>
> Ajay Upadhyaya wrote:
> > Could you share the steps needed to do the indexing, or point to any
> > existing doc... I'm relatively new to ADS,
>
> Sure no problem.  I wrote up a little documentation here on adding
> indices.  I just added this page just now very rapidly so it might be
> sparse.
>
> http://docs.safehaus.org/display/APACHEDS/Performance+Tuning
>
> Alex
>
> > Thanks,
> > Ajay
> >
> > On 9/1/06, *Alex Karasulu * <aok123@bellsouth.net
> > <ma...@bellsouth.net>> wrote:
> >
> >     Hi all,
> >
> >     I've been testing the search performance boost gained from indexing
> >     attributes before starting development on an optimization to improve
> >     index performance and in memory size.  I thought I'd share these
> >     dramatic results of my pre-optimization tests with you since they
> >     clearly show the benefits of indices.
> >
> >     Before I do this let me list the characteristics of the hardware
> used
> >     and my configuration settings:
> >
> >
> >     -------------
> >     Machine Setup
> >     -------------
> >
> >     CPU: Dual Athlon MP 1900
> >     OS: Linux 2.6.15
> >     Mem: 2GB 266Mhz
> >
> >
> >     --------------
> >     ApacheDS Setup
> >     --------------
> >
> >     ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
> >        - Using 1024MB Memory
> >        - Indexed st and initials
> >
> >
> >     ----------
> >     Data Setup
> >     ----------
> >
> >     Wrote a simple tool to generate random values for descent sized
> entries.
> >       The data sort of looks like this for a user entry:
> >
> >             dn: uid=user.1,ou=users,dc=example,dc=com
> >             uid: user.1
> >             initials: yq
> >             description: cFocJATNuhlXisDCqGtY
> >             pager: FYyimqyZRW
> >             cn: HSGMzajYKmicUTe
> >             postalcode: WiXXA
> >             st: xy
> >             street: kpCCqmrsCzkpdtHXWMfY
> >             l: KqmAXFYTrI
> >             objectclass: person
> >             objectclass: organizationalPerson
> >             objectclass: inetOrgPerson
> >             sn: nuymgOwpm
> >             homephone: PERamkCtsv
> >             mobile: vkIviOGNTC
> >             telephonenumber: 7248889026
> >             mail: pYvEoOjSnEymcWD
> >             givenname: IVHJZB
> >             postaladdress: crObexKoUTIFdzNHcZMr
> >             employeenumber: 1
> >             userpassword:: cGFzc3dvcmQ=
> >
> >     I started loading a partition up with these entries 100,000 of them
> at a
> >     time then performing the following searches for all entries with
> >     initials aa:
> >
> >     (1) index on initials but no cached entries
> >     (2) index on initials with cached entries
> >     (3) no index without cached entries
> >
> >     Here are the results at the various capacities:
> >
> >     ---------------
> >     100,000 Entries
> >     ---------------
> >
> >          [cached] [indexed] [time (seconds)]
> >
> >     (1)    no       yes         3.30
> >     (2)    yes      yes        0.72
> >     (3)    no       no         30.63
> >
> >     search results = 153 entries
> >
> >
> >     ---------------
> >     200,000 Entries
> >     ---------------
> >
> >          [cached] [indexed] [time (seconds)]
> >
> >     (1)    no       yes        6.04
> >     (2)    yes      yes        1.44
> >     (3)    no       no         82
> >
> >     search results = 302 entries
> >
> >
> >     ---------------
> >     300,000 Entries
> >     ---------------
> >
> >          [cached] [indexed] [time (seconds)]
> >
> >     (1)    no       yes        7.54
> >     (2)    yes      yes        1.95
> >     (3)    no       no         146
> >
> >     search results = 451 entries
> >
> >
> >     ---------------
> >     400,000 Entries
> >     ---------------
> >
> >          [cached] [indexed] [time (seconds)]
> >
> >     (1)    no       yes        9.24
> >     (2)    yes      yes        3.80
> >     (3)    no       no         196
> >
> >     search results = 586 entries
> >
> >
> >     ---------------
> >     500,000 Entries
> >     ---------------
> >
> >          [cached] [indexed] [time (seconds)]
> >
> >     (1)    no       yes        11.96
> >     (2)    yes      yes        3.21
> >     (3)    no       no         224
> >
> >     search results = 748 entries
> >
> >
> >     Alex
> >
> >
> >
>
>

Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Alex Karasulu <ao...@bellsouth.net>.
Ajay Upadhyaya wrote:
> Could you share the steps needed to do the indexing, or point to any 
> existing doc... I'm relatively new to ADS,

Sure no problem.  I wrote up a little documentation here on adding 
indices.  I just added this page just now very rapidly so it might be 
sparse.

http://docs.safehaus.org/display/APACHEDS/Performance+Tuning

Alex

> Thanks,
> Ajay
> 
> On 9/1/06, *Alex Karasulu * <aok123@bellsouth.net 
> <ma...@bellsouth.net>> wrote:
> 
>     Hi all,
> 
>     I've been testing the search performance boost gained from indexing
>     attributes before starting development on an optimization to improve
>     index performance and in memory size.  I thought I'd share these
>     dramatic results of my pre-optimization tests with you since they
>     clearly show the benefits of indices.
> 
>     Before I do this let me list the characteristics of the hardware used
>     and my configuration settings:
> 
> 
>     -------------
>     Machine Setup
>     -------------
> 
>     CPU: Dual Athlon MP 1900
>     OS: Linux 2.6.15
>     Mem: 2GB 266Mhz
> 
> 
>     --------------
>     ApacheDS Setup
>     --------------
> 
>     ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
>        - Using 1024MB Memory
>        - Indexed st and initials
> 
> 
>     ----------
>     Data Setup
>     ----------
> 
>     Wrote a simple tool to generate random values for descent sized entries.
>       The data sort of looks like this for a user entry:
> 
>             dn: uid=user.1,ou=users,dc=example,dc=com
>             uid: user.1
>             initials: yq
>             description: cFocJATNuhlXisDCqGtY
>             pager: FYyimqyZRW
>             cn: HSGMzajYKmicUTe
>             postalcode: WiXXA
>             st: xy
>             street: kpCCqmrsCzkpdtHXWMfY
>             l: KqmAXFYTrI
>             objectclass: person
>             objectclass: organizationalPerson
>             objectclass: inetOrgPerson
>             sn: nuymgOwpm
>             homephone: PERamkCtsv
>             mobile: vkIviOGNTC
>             telephonenumber: 7248889026
>             mail: pYvEoOjSnEymcWD
>             givenname: IVHJZB
>             postaladdress: crObexKoUTIFdzNHcZMr
>             employeenumber: 1
>             userpassword:: cGFzc3dvcmQ=
> 
>     I started loading a partition up with these entries 100,000 of them at a
>     time then performing the following searches for all entries with
>     initials aa:
> 
>     (1) index on initials but no cached entries
>     (2) index on initials with cached entries
>     (3) no index without cached entries
> 
>     Here are the results at the various capacities:
> 
>     ---------------
>     100,000 Entries
>     ---------------
> 
>          [cached] [indexed] [time (seconds)]
> 
>     (1)    no       yes         3.30
>     (2)    yes      yes        0.72
>     (3)    no       no         30.63
> 
>     search results = 153 entries
> 
> 
>     ---------------
>     200,000 Entries
>     ---------------
> 
>          [cached] [indexed] [time (seconds)]
> 
>     (1)    no       yes        6.04
>     (2)    yes      yes        1.44
>     (3)    no       no         82
> 
>     search results = 302 entries
> 
> 
>     ---------------
>     300,000 Entries
>     ---------------
> 
>          [cached] [indexed] [time (seconds)]
> 
>     (1)    no       yes        7.54
>     (2)    yes      yes        1.95
>     (3)    no       no         146
> 
>     search results = 451 entries
> 
> 
>     ---------------
>     400,000 Entries
>     ---------------
> 
>          [cached] [indexed] [time (seconds)]
> 
>     (1)    no       yes        9.24
>     (2)    yes      yes        3.80
>     (3)    no       no         196
> 
>     search results = 586 entries
> 
> 
>     ---------------
>     500,000 Entries
>     ---------------
> 
>          [cached] [indexed] [time (seconds)]
> 
>     (1)    no       yes        11.96
>     (2)    yes      yes        3.21
>     (3)    no       no         224
> 
>     search results = 748 entries
> 
> 
>     Alex
> 
> 
> 


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Ajay Upadhyaya <aj...@gmail.com>.
Could you share the steps needed to do the indexing, or point to any
existing doc... I'm relatively new to ADS,

Thanks,
Ajay

On 9/1/06, Alex Karasulu <ao...@bellsouth.net> wrote:
>
> Hi all,
>
> I've been testing the search performance boost gained from indexing
> attributes before starting development on an optimization to improve
> index performance and in memory size.  I thought I'd share these
> dramatic results of my pre-optimization tests with you since they
> clearly show the benefits of indices.
>
> Before I do this let me list the characteristics of the hardware used
> and my configuration settings:
>
>
> -------------
> Machine Setup
> -------------
>
> CPU: Dual Athlon MP 1900
> OS: Linux 2.6.15
> Mem: 2GB 266Mhz
>
>
> --------------
> ApacheDS Setup
> --------------
>
> ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
>    - Using 1024MB Memory
>    - Indexed st and initials
>
>
> ----------
> Data Setup
> ----------
>
> Wrote a simple tool to generate random values for descent sized entries.
>   The data sort of looks like this for a user entry:
>
>         dn: uid=user.1,ou=users,dc=example,dc=com
>         uid: user.1
>         initials: yq
>         description: cFocJATNuhlXisDCqGtY
>         pager: FYyimqyZRW
>         cn: HSGMzajYKmicUTe
>         postalcode: WiXXA
>         st: xy
>         street: kpCCqmrsCzkpdtHXWMfY
>         l: KqmAXFYTrI
>         objectclass: person
>         objectclass: organizationalPerson
>         objectclass: inetOrgPerson
>         sn: nuymgOwpm
>         homephone: PERamkCtsv
>         mobile: vkIviOGNTC
>         telephonenumber: 7248889026
>         mail: pYvEoOjSnEymcWD
>         givenname: IVHJZB
>         postaladdress: crObexKoUTIFdzNHcZMr
>         employeenumber: 1
>         userpassword:: cGFzc3dvcmQ=
>
> I started loading a partition up with these entries 100,000 of them at a
> time then performing the following searches for all entries with
> initials aa:
>
> (1) index on initials but no cached entries
> (2) index on initials with cached entries
> (3) no index without cached entries
>
> Here are the results at the various capacities:
>
> ---------------
> 100,000 Entries
> ---------------
>
>      [cached] [indexed] [time (seconds)]
>
> (1)    no       yes        3.30
> (2)    yes      yes        0.72
> (3)    no       no         30.63
>
> search results = 153 entries
>
>
> ---------------
> 200,000 Entries
> ---------------
>
>      [cached] [indexed] [time (seconds)]
>
> (1)    no       yes        6.04
> (2)    yes      yes        1.44
> (3)    no       no         82
>
> search results = 302 entries
>
>
> ---------------
> 300,000 Entries
> ---------------
>
>      [cached] [indexed] [time (seconds)]
>
> (1)    no       yes        7.54
> (2)    yes      yes        1.95
> (3)    no       no         146
>
> search results = 451 entries
>
>
> ---------------
> 400,000 Entries
> ---------------
>
>      [cached] [indexed] [time (seconds)]
>
> (1)    no       yes        9.24
> (2)    yes      yes        3.80
> (3)    no       no         196
>
> search results = 586 entries
>
>
> ---------------
> 500,000 Entries
> ---------------
>
>      [cached] [indexed] [time (seconds)]
>
> (1)    no       yes        11.96
> (2)    yes      yes        3.21
> (3)    no       no         224
>
> search results = 748 entries
>
>
> Alex
>
>
>

Ready for GA? (Was Re: [ApacheDS] [Performance] Using indices to boost search performance)

Posted by Alex Karasulu <ao...@bellsouth.net>.
I think with this last optimization for greater capacity the server is 
ready to handle serious datasets and can be made generally available.

Thoughts?

Alex


Alex Karasulu wrote:
> Hi all,
> 
> 
> As promised here are some more statistics as we fill up a single 
> partition up to 2 million entries.
> 
> 
> ---------------
> 750,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        7.26
> (2)    yes      yes        5.04
> (3)    no       no         565 (9m25s)
> 
> #1-#2 = 2.22
> 
> search results = 1039 entries
> 
> NOTE:
> -----
> I started changing settings a little here on.  I did this because the 
> number of entries returned with cache was way more than what we had 
> allocated.  I changed the memory settings and cache setting from the 
> defaults.  Here's what I did:
> 
>  o up'ed memory from 384 MB to 1024 MB
>  o increased initials cache to 5000 from 100
>  o increased entry cache from 100 to 5000
>  o increased system indices except alias indices from 100 to 1000
> 
> I turned things up a bit so I can load the database as fast as possible 
> since we're increasing the amount added per run to 250K entries now.
> 
> Let's do the 750K again:
> 
> ---------------
> 750,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        8.22
> (2)    yes      yes        2.61
> (3)    no       no         stopped
> 
> #1-#2 = 5.61
> 
> search results = 1039 entries
> 
> 
> -----------------
> 1,000,000 Entries
> -----------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        10.22
> (2)    yes      yes         3.59
> 
> #1-#2 = 6.63
> search results = 1373 entries
> 
> 
> 
> -----------------
> 1,250,000 Entries
> -----------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        12.71
> (2)    yes      yes         4.50
> 
> #1-#2 = 8.21
> search results = 1738 entries
> 
> 
> -----------------
> 1,500,000 Entries
> -----------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        14.74
> (2)    yes      yes        11.84
> 
> 
> #1-#2 = 2.9
> search results = 2123 entries
> 
> Now let's search this 1.5 million entry partition by limiting the number 
> of returned entries to 1000.
> 
> no cache: 8.10
> w/ cache: 3.04
> 
> #1-#2 = 5.06
> 
> Very similar to the values we got for 0.75 million entries at 1039 
> entries returned.
> 
> For the last value I'll load the partition up to 2 million entries and 
> run the same tests again:
> 
> -----------------
> 2,000,000 Entries
> -----------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        18.58
> (2)    yes      yes        15.38
> 
> 
> #1-#2 = 3.1
> search results = 2851 entries
> 
> Now let's search this 2 million entry partition by limiting the number 
> of returned entries to 1000.
> 
> no cache:   8.04
> w/ cache:   2.55
> #1-#2 = 5.49
> 
> Wow performance for returning 1000 entries is not diminishing as we 
> scale from 1 million to 2 million entries.  Both for cached and 
> non-cached returns.
> 
> Looking at these results I think we can easily accommodate 10 million 
> entries per partition.  Perhaps with a capacity limit on the order of 
> 100 million entries per partition.  Note that you can have as many 
> partitions as you like in a single server.
> 
> Alex
> 
> 
> Alex Karasulu wrote:
>> Hi again,
>>
>> I've completed my optimization on indices under high partition 
>> capacities.  The results are in line and side by side to the old 
>> results.  For those not interested in the exact optimization that took 
>> place skip down to the results.
>>
>>
>> The problem:
>> ------------
>>
>> Up to now duplicate keys (keys with many values) were stored using a 
>> TreeSet within the B+Trees of indices.  Indices usually map some 
>> attribute value to an ID into the master table for an entry.  The ID 
>> is a sequentially unique value assigned to the entry.
>>
>> So if 100 people have initials AA then there would be 100 values in 
>> the TreeSet for the key 'AA' in the intials index.  This would be 
>> serialized and deserialized to and from disk.  At these low numbers 
>> there's really very little impact.  However certain indices were 
>> seriously impacted like the objectClass index or the hierarchy based 
>> system index.  Just imagine having 1 million entries of 
>> inetOrgPersons.  The objectClass index would then have 1 million 
>> values in the TreeSet for the key 'inetOrgPerson'.  This would 
>> seriously hurt performance from both a space and time perspective.  It 
>> would essentially obfuscate the benefits of using BTree's in the first 
>> place with large numbers of entries.
>>
>>
>> Solution:
>> ---------
>>
>> What I did was add a configurable threshold parameter when instead of 
>> using a TreeSet to store duplicates another B+Tree was created and 
>> used within the same index database file.  Right now the default value 
>> for this which these tests were conducted with is 1000.  So after 
>> having 1000 duplicate values the server switches from using in memory 
>> TreeSets to on disk B+Trees for only storing values for that key.
>>
>>
>> Alex Karasulu wrote:
>>> Hi all,
>>>
>>> I've been testing the search performance boost gained from indexing
>>> attributes before starting development on an optimization to improve
>>> index performance and in memory size.  I thought I'd share these
>>> dramatic results of my pre-optimization tests with you since they
>>> clearly show the benefits of indices.
>>>
>>> Before I do this let me list the characteristics of the hardware used
>>> and my configuration settings:
>>>
>>>
>>> -------------
>>> Machine Setup
>>> -------------
>>>
>>> CPU: Dual Athlon MP 1900
>>> OS: Linux 2.6.15
>>> Mem: 2GB 266Mhz
>>>
>>>
>>> --------------
>>> ApacheDS Setup
>>> --------------
>>>
>>> ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
>>>   - Using 1024MB Memory
>>>   - Indexed st and initials
>>>
>>>
>>> ----------
>>> Data Setup
>>> ----------
>>>
>>> Wrote a simple tool to generate random values for descent sized entries.
>>>  The data sort of looks like this for a user entry:
>>>
>>>     dn: uid=user.1,ou=users,dc=example,dc=com
>>>     uid: user.1
>>>     initials: yq
>>>     description: cFocJATNuhlXisDCqGtY
>>>     pager: FYyimqyZRW
>>>     cn: HSGMzajYKmicUTe
>>>     postalcode: WiXXA
>>>     st: xy       street: kpCCqmrsCzkpdtHXWMfY
>>>     l: KqmAXFYTrI
>>>     objectclass: person
>>>     objectclass: organizationalPerson
>>>     objectclass: inetOrgPerson
>>>     sn: nuymgOwpm
>>>     homephone: PERamkCtsv
>>>     mobile: vkIviOGNTC
>>>     telephonenumber: 7248889026
>>>     mail: pYvEoOjSnEymcWD
>>>     givenname: IVHJZB
>>>     postaladdress: crObexKoUTIFdzNHcZMr
>>>     employeenumber: 1
>>>     userpassword:: cGFzc3dvcmQ=
>>>
>>> I started loading a partition up with these entries 100,000 of them at a
>>> time then performing the following searches for all entries with
>>> initials aa:
>>>
>>> (1) index on initials but no cached entries
>>> (2) index on initials with cached entries
>>> (3) no index without cached entries
>>>
>>> Here are the results at the various capacities:
>>>
>>> ---------------
>>> 100,000 Entries
>>> ---------------
>>>
>>>     [cached] [indexed] [time (seconds)]
>>>
>>> (1)    no       yes        3.30
>>                              2.37
>>> (2)    yes      yes        0.72
>>                              0.76
>>> (3)    no       no         30.63
>>                              42.08
>>
>>> search results = 153 entries
>>                    146
>>
>> Notice that if #2 is total time for cached entries we can conclude 
>> that #1 - #2 is a descent measure of pulling from disk.  So let's see 
>> what that number looks like:
>>
>> Old: 2.58
>> New: 1.61
>>
>>>
>>>
>>> ---------------
>>> 200,000 Entries
>>> ---------------
>>>
>>>     [cached] [indexed] [time (seconds)]
>>>
>>> (1)    no       yes        6.04
>>                              3.29
>>> (2)    yes      yes        1.44
>>                              1.49
>>> (3)    no       no         82
>>                              83
>>>
>>> search results = 302 entries
>>                    297
>> #1 - #2 values:
>>
>> Old: 4.60
>> New: 1.80
>>
>>>
>>>
>>> ---------------
>>> 300,000 Entries
>>> ---------------
>>>
>>>     [cached] [indexed] [time (seconds)]
>>>
>>> (1)    no       yes        7.54
>>                              4.11
>>> (2)    yes      yes        1.95
>>                              2.22
>>> (3)    no       no         146
>>                              128
>>>
>>> search results = 451 entries
>>                    435
>>
>> #1 - #2 values:
>>
>> Old: 5.59
>> New: 1.89
>>
>> Before the optimization there seems to be a linear growth to access 
>> time whereas after the optimization the growth rate is almost 
>> unmeasurable above the noise.
>>
>> -Alex
>>
>>
> 
> 
>>>
>>>
>>> ---------------
>>> 400,000 Entries
>>> ---------------
>>>
>>>     [cached] [indexed] [time (seconds)]
>>>
>>> (1)    no       yes        9.24
>>                              4.87
>>> (2)    yes      yes        3.80
>>                              2.69
>>> (3)    no       no         196
>>                              168
>>>
>>> search results = 586 entries
>>                    577
>>
>> #1 - #2 values:
>>
>> Old: 5.44
>> New: 2.18
>>
>>>
>>>
>>> ---------------
>>> 500,000 Entries
>>> ---------------
>>>
>>>     [cached] [indexed] [time (seconds)]
>>>
>>> (1)    no       yes        11.96
>>                               5.06
>>> (2)    yes      yes        3.21
>>                              3.21
>>> (3)    no       no         224
>>                              209
>>>
>>> search results = 748 entries
>>                    706
>>
>> #1 - #2 values:
>>
>> Old: 8.75
>> New: 1.85
>>
>>
>> Besides these numbers I've noticed that adds are now faster and total 
>> garbage collection is lower.  These were eyeballed figures.  For 
>> example the adds would grow in time up to 28 minutes per 100K entries 
>> for those between 400K-500K.  This has dropped down to 18 minutes.
>>
>> Now it would be interesting to see what happens with these trends once 
>> the intials index converts from a TreeSet to a BTree for the values of 
>> the key 'aa'.  So just to see we keep loading up the partition until 
>> there are over 1000 entries with initials 'aa'.
>>
>> I'll report on this tomorrow as the server loads these entries.
>>
>> I guess the conclusion here is that the optimization had a significant 
>> effect and is worth the additional complexity it introduced into the 
>> JDBM partition implementation.  Here's the access times (#1-#2) as a
>> function of the number of entries:
>>
>>
>> Retrieval Times | 100K | 200K | 300K | 400K | 500K
>> --------------------------------------------------
>> Before:         | 2.58 | 4.60 | 5.59 | 5.44 | 8.75
>> After:          | 1.61 | 1.80 | 1.89 | 2.18 | 1.85
>>
>>
>> Before the optimization there seems to be a linear growth to access 
>> time whereas after the optimization the growth rate is almost 
>> unmeasurable above the noise.
>>
>> -Alex
>>
>>
> 
> 


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Alex Karasulu <ao...@bellsouth.net>.
Hi all,


As promised here are some more statistics as we fill up a single 
partition up to 2 million entries.


---------------
750,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        7.26
(2)    yes      yes        5.04
(3)    no       no         565 (9m25s)

#1-#2 = 2.22

search results = 1039 entries

NOTE:
-----
I started changing settings a little here on.  I did this because the 
number of entries returned with cache was way more than what we had 
allocated.  I changed the memory settings and cache setting from the 
defaults.  Here's what I did:

  o up'ed memory from 384 MB to 1024 MB
  o increased initials cache to 5000 from 100
  o increased entry cache from 100 to 5000
  o increased system indices except alias indices from 100 to 1000

I turned things up a bit so I can load the database as fast as possible 
since we're increasing the amount added per run to 250K entries now.

Let's do the 750K again:

---------------
750,000 Entries
---------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        8.22
(2)    yes      yes        2.61
(3)    no       no         stopped

#1-#2 = 5.61

search results = 1039 entries


-----------------
1,000,000 Entries
-----------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        10.22
(2)    yes      yes         3.59

#1-#2 = 6.63
search results = 1373 entries



-----------------
1,250,000 Entries
-----------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        12.71
(2)    yes      yes         4.50

#1-#2 = 8.21
search results = 1738 entries


-----------------
1,500,000 Entries
-----------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        14.74
(2)    yes      yes        11.84


#1-#2 = 2.9
search results = 2123 entries

Now let's search this 1.5 million entry partition by limiting the number 
of returned entries to 1000.

no cache: 8.10
w/ cache: 3.04

#1-#2 = 5.06

Very similar to the values we got for 0.75 million entries at 1039 
entries returned.

For the last value I'll load the partition up to 2 million entries and 
run the same tests again:

-----------------
2,000,000 Entries
-----------------

     [cached] [indexed] [time (seconds)]

(1)    no       yes        18.58
(2)    yes      yes        15.38


#1-#2 = 3.1
search results = 2851 entries

Now let's search this 2 million entry partition by limiting the number 
of returned entries to 1000.

no cache:   8.04
w/ cache:   2.55
#1-#2 = 5.49

Wow performance for returning 1000 entries is not diminishing as we 
scale from 1 million to 2 million entries.  Both for cached and 
non-cached returns.

Looking at these results I think we can easily accommodate 10 million 
entries per partition.  Perhaps with a capacity limit on the order of 
100 million entries per partition.  Note that you can have as many 
partitions as you like in a single server.

Alex


Alex Karasulu wrote:
> Hi again,
> 
> I've completed my optimization on indices under high partition 
> capacities.  The results are in line and side by side to the old 
> results.  For those not interested in the exact optimization that took 
> place skip down to the results.
> 
> 
> The problem:
> ------------
> 
> Up to now duplicate keys (keys with many values) were stored using a 
> TreeSet within the B+Trees of indices.  Indices usually map some 
> attribute value to an ID into the master table for an entry.  The ID is 
> a sequentially unique value assigned to the entry.
> 
> So if 100 people have initials AA then there would be 100 values in the 
> TreeSet for the key 'AA' in the intials index.  This would be serialized 
> and deserialized to and from disk.  At these low numbers there's really 
> very little impact.  However certain indices were seriously impacted 
> like the objectClass index or the hierarchy based system index.  Just 
> imagine having 1 million entries of inetOrgPersons.  The objectClass 
> index would then have 1 million values in the TreeSet for the key 
> 'inetOrgPerson'.  This would seriously hurt performance from both a 
> space and time perspective.  It would essentially obfuscate the benefits 
> of using BTree's in the first place with large numbers of entries.
> 
> 
> Solution:
> ---------
> 
> What I did was add a configurable threshold parameter when instead of 
> using a TreeSet to store duplicates another B+Tree was created and used 
> within the same index database file.  Right now the default value for 
> this which these tests were conducted with is 1000.  So after having 
> 1000 duplicate values the server switches from using in memory TreeSets 
> to on disk B+Trees for only storing values for that key.
> 
> 
> Alex Karasulu wrote:
>> Hi all,
>>
>> I've been testing the search performance boost gained from indexing
>> attributes before starting development on an optimization to improve
>> index performance and in memory size.  I thought I'd share these
>> dramatic results of my pre-optimization tests with you since they
>> clearly show the benefits of indices.
>>
>> Before I do this let me list the characteristics of the hardware used
>> and my configuration settings:
>>
>>
>> -------------
>> Machine Setup
>> -------------
>>
>> CPU: Dual Athlon MP 1900
>> OS: Linux 2.6.15
>> Mem: 2GB 266Mhz
>>
>>
>> --------------
>> ApacheDS Setup
>> --------------
>>
>> ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
>>   - Using 1024MB Memory
>>   - Indexed st and initials
>>
>>
>> ----------
>> Data Setup
>> ----------
>>
>> Wrote a simple tool to generate random values for descent sized entries.
>>  The data sort of looks like this for a user entry:
>>
>>     dn: uid=user.1,ou=users,dc=example,dc=com
>>     uid: user.1
>>     initials: yq
>>     description: cFocJATNuhlXisDCqGtY
>>     pager: FYyimqyZRW
>>     cn: HSGMzajYKmicUTe
>>     postalcode: WiXXA
>>     st: xy       street: kpCCqmrsCzkpdtHXWMfY
>>     l: KqmAXFYTrI
>>     objectclass: person
>>     objectclass: organizationalPerson
>>     objectclass: inetOrgPerson
>>     sn: nuymgOwpm
>>     homephone: PERamkCtsv
>>     mobile: vkIviOGNTC
>>     telephonenumber: 7248889026
>>     mail: pYvEoOjSnEymcWD
>>     givenname: IVHJZB
>>     postaladdress: crObexKoUTIFdzNHcZMr
>>     employeenumber: 1
>>     userpassword:: cGFzc3dvcmQ=
>>
>> I started loading a partition up with these entries 100,000 of them at a
>> time then performing the following searches for all entries with
>> initials aa:
>>
>> (1) index on initials but no cached entries
>> (2) index on initials with cached entries
>> (3) no index without cached entries
>>
>> Here are the results at the various capacities:
>>
>> ---------------
>> 100,000 Entries
>> ---------------
>>
>>     [cached] [indexed] [time (seconds)]
>>
>> (1)    no       yes        3.30
>                              2.37
>> (2)    yes      yes        0.72
>                              0.76
>> (3)    no       no         30.63
>                              42.08
> 
>> search results = 153 entries
>                    146
> 
> Notice that if #2 is total time for cached entries we can conclude that 
> #1 - #2 is a descent measure of pulling from disk.  So let's see what 
> that number looks like:
> 
> Old: 2.58
> New: 1.61
> 
>>
>>
>> ---------------
>> 200,000 Entries
>> ---------------
>>
>>     [cached] [indexed] [time (seconds)]
>>
>> (1)    no       yes        6.04
>                              3.29
>> (2)    yes      yes        1.44
>                              1.49
>> (3)    no       no         82
>                              83
>>
>> search results = 302 entries
>                    297
> #1 - #2 values:
> 
> Old: 4.60
> New: 1.80
> 
>>
>>
>> ---------------
>> 300,000 Entries
>> ---------------
>>
>>     [cached] [indexed] [time (seconds)]
>>
>> (1)    no       yes        7.54
>                              4.11
>> (2)    yes      yes        1.95
>                              2.22
>> (3)    no       no         146
>                              128
>>
>> search results = 451 entries
>                    435
> 
> #1 - #2 values:
> 
> Old: 5.59
> New: 1.89
> 
> Before the optimization there seems to be a linear growth to access time 
> whereas after the optimization the growth rate is almost unmeasurable 
> above the noise.
> 
> -Alex
> 
> 


>>
>>
>> ---------------
>> 400,000 Entries
>> ---------------
>>
>>     [cached] [indexed] [time (seconds)]
>>
>> (1)    no       yes        9.24
>                              4.87
>> (2)    yes      yes        3.80
>                              2.69
>> (3)    no       no         196
>                              168
>>
>> search results = 586 entries
>                    577
> 
> #1 - #2 values:
> 
> Old: 5.44
> New: 2.18
> 
>>
>>
>> ---------------
>> 500,000 Entries
>> ---------------
>>
>>     [cached] [indexed] [time (seconds)]
>>
>> (1)    no       yes        11.96
>                               5.06
>> (2)    yes      yes        3.21
>                              3.21
>> (3)    no       no         224
>                              209
>>
>> search results = 748 entries
>                    706
> 
> #1 - #2 values:
> 
> Old: 8.75
> New: 1.85
> 
> 
> Besides these numbers I've noticed that adds are now faster and total 
> garbage collection is lower.  These were eyeballed figures.  For example 
> the adds would grow in time up to 28 minutes per 100K entries for those 
> between 400K-500K.  This has dropped down to 18 minutes.
> 
> Now it would be interesting to see what happens with these trends once 
> the intials index converts from a TreeSet to a BTree for the values of 
> the key 'aa'.  So just to see we keep loading up the partition until 
> there are over 1000 entries with initials 'aa'.
> 
> I'll report on this tomorrow as the server loads these entries.
> 
> I guess the conclusion here is that the optimization had a significant 
> effect and is worth the additional complexity it introduced into the 
> JDBM partition implementation.  Here's the access times (#1-#2) as a
> function of the number of entries:
> 
> 
> Retrieval Times | 100K | 200K | 300K | 400K | 500K
> --------------------------------------------------
> Before:         | 2.58 | 4.60 | 5.59 | 5.44 | 8.75
> After:          | 1.61 | 1.80 | 1.89 | 2.18 | 1.85
> 
> 
> Before the optimization there seems to be a linear growth to access time 
> whereas after the optimization the growth rate is almost unmeasurable 
> above the noise.
> 
> -Alex
> 
> 


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Alex Karasulu <ao...@bellsouth.net>.
Hi again,

I've completed my optimization on indices under high partition 
capacities.  The results are in line and side by side to the old 
results.  For those not interested in the exact optimization that took 
place skip down to the results.


The problem:
------------

Up to now duplicate keys (keys with many values) were stored using a 
TreeSet within the B+Trees of indices.  Indices usually map some 
attribute value to an ID into the master table for an entry.  The ID is 
a sequentially unique value assigned to the entry.

So if 100 people have initials AA then there would be 100 values in the 
TreeSet for the key 'AA' in the intials index.  This would be serialized 
and deserialized to and from disk.  At these low numbers there's really 
very little impact.  However certain indices were seriously impacted 
like the objectClass index or the hierarchy based system index.  Just 
imagine having 1 million entries of inetOrgPersons.  The objectClass 
index would then have 1 million values in the TreeSet for the key 
'inetOrgPerson'.  This would seriously hurt performance from both a 
space and time perspective.  It would essentially obfuscate the benefits 
of using BTree's in the first place with large numbers of entries.


Solution:
---------

What I did was add a configurable threshold parameter when instead of 
using a TreeSet to store duplicates another B+Tree was created and used 
within the same index database file.  Right now the default value for 
this which these tests were conducted with is 1000.  So after having 
1000 duplicate values the server switches from using in memory TreeSets 
to on disk B+Trees for only storing values for that key.


Alex Karasulu wrote:
> Hi all,
> 
> I've been testing the search performance boost gained from indexing
> attributes before starting development on an optimization to improve
> index performance and in memory size.  I thought I'd share these
> dramatic results of my pre-optimization tests with you since they
> clearly show the benefits of indices.
> 
> Before I do this let me list the characteristics of the hardware used
> and my configuration settings:
> 
> 
> -------------
> Machine Setup
> -------------
> 
> CPU: Dual Athlon MP 1900
> OS: Linux 2.6.15
> Mem: 2GB 266Mhz
> 
> 
> --------------
> ApacheDS Setup
> --------------
> 
> ApacheDS: Stock RC4 (to be released pre-image) w/ modifications
>   - Using 1024MB Memory
>   - Indexed st and initials
> 
> 
> ----------
> Data Setup
> ----------
> 
> Wrote a simple tool to generate random values for descent sized entries.
>  The data sort of looks like this for a user entry:
> 
>     dn: uid=user.1,ou=users,dc=example,dc=com
>     uid: user.1
>     initials: yq
>     description: cFocJATNuhlXisDCqGtY
>     pager: FYyimqyZRW
>     cn: HSGMzajYKmicUTe
>     postalcode: WiXXA
>     st: xy   
>     street: kpCCqmrsCzkpdtHXWMfY
>     l: KqmAXFYTrI
>     objectclass: person
>     objectclass: organizationalPerson
>     objectclass: inetOrgPerson
>     sn: nuymgOwpm
>     homephone: PERamkCtsv
>     mobile: vkIviOGNTC
>     telephonenumber: 7248889026
>     mail: pYvEoOjSnEymcWD
>     givenname: IVHJZB
>     postaladdress: crObexKoUTIFdzNHcZMr
>     employeenumber: 1
>     userpassword:: cGFzc3dvcmQ=
> 
> I started loading a partition up with these entries 100,000 of them at a
> time then performing the following searches for all entries with
> initials aa:
> 
> (1) index on initials but no cached entries
> (2) index on initials with cached entries
> (3) no index without cached entries
> 
> Here are the results at the various capacities:
> 
> ---------------
> 100,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        3.30
                              2.37
> (2)    yes      yes        0.72
                              0.76
> (3)    no       no         30.63
                              42.08

> search results = 153 entries
                    146

Notice that if #2 is total time for cached entries we can conclude that 
#1 - #2 is a descent measure of pulling from disk.  So let's see what 
that number looks like:

Old: 2.58
New: 1.61

> 
> 
> ---------------
> 200,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        6.04
                              3.29
> (2)    yes      yes        1.44
                              1.49
> (3)    no       no         82
                              83
> 
> search results = 302 entries
                    297
#1 - #2 values:

Old: 4.60
New: 1.80

> 
> 
> ---------------
> 300,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        7.54
                              4.11
> (2)    yes      yes        1.95
                              2.22
> (3)    no       no         146
                              128
> 
> search results = 451 entries
                    435

#1 - #2 values:

Old: 5.59
New: 1.89

> 
> 
> ---------------
> 400,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        9.24
                              4.87
> (2)    yes      yes        3.80
                              2.69
> (3)    no       no         196
                              168
> 
> search results = 586 entries
                    577

#1 - #2 values:

Old: 5.44
New: 2.18

> 
> 
> ---------------
> 500,000 Entries
> ---------------
> 
>     [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        11.96
                               5.06
> (2)    yes      yes        3.21
                              3.21
> (3)    no       no         224
                              209
> 
> search results = 748 entries
                    706

#1 - #2 values:

Old: 8.75
New: 1.85


Besides these numbers I've noticed that adds are now faster and total 
garbage collection is lower.  These were eyeballed figures.  For example 
the adds would grow in time up to 28 minutes per 100K entries for those 
between 400K-500K.  This has dropped down to 18 minutes.

Now it would be interesting to see what happens with these trends once 
the intials index converts from a TreeSet to a BTree for the values of 
the key 'aa'.  So just to see we keep loading up the partition until 
there are over 1000 entries with initials 'aa'.

I'll report on this tomorrow as the server loads these entries.

I guess the conclusion here is that the optimization had a significant 
effect and is worth the additional complexity it introduced into the 
JDBM partition implementation.  Here's the access times (#1-#2) as a
function of the number of entries:


Retrieval Times | 100K | 200K | 300K | 400K | 500K
--------------------------------------------------
Before:         | 2.58 | 4.60 | 5.59 | 5.44 | 8.75
After:          | 1.61 | 1.80 | 1.89 | 2.18 | 1.85


Before the optimization there seems to be a linear growth to access time 
whereas after the optimization the growth rate is almost unmeasurable 
above the noise.

-Alex


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Ole Ersoy <ol...@yahoo.com>.
Oh - Sorry for the confusion - I meant that as a
compliment - But nice insight addition,

- Ole

--- Emmanuel Lecharny <el...@gmail.com> wrote:

> Ole Ersoy a écrit :
> 
> >I'm going to read this email as soon as the smoke
> >clears...
> >  
> >
> Nothing really complicated :)
> 
> The idea is to do a performance measure before doing
> some optimisation 
> on some part of the code, so Alex will have a clear
> vision of what kind 
> of performance improvment he will obtain.
> 
> Here, what we can say from those tests, is that the
> indices are not 
> really optimal : it seems that the time to grab
> entries grow linearly 
> with the growth of the database. For instance, from
> 300 000 entries to 
> 400 000 entries, we have an increase in time to
> search for some entries 
> which is similar than from 200 000 to 300 000 : 6.04
> sec for 200K, 7.14 
> for 300K, 9.24 for 400K. We would excpect the growth
> to be logarithmic.
> 
> Alex is working on that currently.
> 
> There is a page on confluence where we have put some
> thought about 
> backend, and it would be cool to add any comment,
> thought, improvment, 
> ideas :
> http://docs.safehaus.org/display/APACHEDS/Backend
> 
> Emmanuel.
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Emmanuel Lecharny <el...@gmail.com>.
Ole Ersoy a écrit :

>I'm going to read this email as soon as the smoke
>clears...
>  
>
Nothing really complicated :)

The idea is to do a performance measure before doing some optimisation 
on some part of the code, so Alex will have a clear vision of what kind 
of performance improvment he will obtain.

Here, what we can say from those tests, is that the indices are not 
really optimal : it seems that the time to grab entries grow linearly 
with the growth of the database. For instance, from 300 000 entries to 
400 000 entries, we have an increase in time to search for some entries 
which is similar than from 200 000 to 300 000 : 6.04 sec for 200K, 7.14 
for 300K, 9.24 for 400K. We would excpect the growth to be logarithmic.

Alex is working on that currently.

There is a page on confluence where we have put some thought about 
backend, and it would be cool to add any comment, thought, improvment, 
ideas :
http://docs.safehaus.org/display/APACHEDS/Backend

Emmanuel.


Re: [ApacheDS] [Performance] Using indices to boost search performance

Posted by Ole Ersoy <ol...@yahoo.com>.
I'm going to read this email as soon as the smoke
clears...

--- Alex Karasulu <ao...@bellsouth.net> wrote:

> Hi all,
> 
> I've been testing the search performance boost
> gained from indexing
> attributes before starting development on an
> optimization to improve
> index performance and in memory size.  I thought I'd
> share these
> dramatic results of my pre-optimization tests with
> you since they
> clearly show the benefits of indices.
> 
> Before I do this let me list the characteristics of
> the hardware used
> and my configuration settings:
> 
> 
> -------------
> Machine Setup
> -------------
> 
> CPU: Dual Athlon MP 1900
> OS: Linux 2.6.15
> Mem: 2GB 266Mhz
> 
> 
> --------------
> ApacheDS Setup
> --------------
> 
> ApacheDS: Stock RC4 (to be released pre-image) w/
> modifications
>    - Using 1024MB Memory
>    - Indexed st and initials
> 
> 
> ----------
> Data Setup
> ----------
> 
> Wrote a simple tool to generate random values for
> descent sized entries.
>   The data sort of looks like this for a user entry:
> 
> 	dn: uid=user.1,ou=users,dc=example,dc=com
> 	uid: user.1
> 	initials: yq
> 	description: cFocJATNuhlXisDCqGtY
> 	pager: FYyimqyZRW
> 	cn: HSGMzajYKmicUTe
> 	postalcode: WiXXA
> 	st: xy	
> 	street: kpCCqmrsCzkpdtHXWMfY
> 	l: KqmAXFYTrI
> 	objectclass: person
> 	objectclass: organizationalPerson
> 	objectclass: inetOrgPerson
> 	sn: nuymgOwpm
> 	homephone: PERamkCtsv
> 	mobile: vkIviOGNTC
> 	telephonenumber: 7248889026
> 	mail: pYvEoOjSnEymcWD
> 	givenname: IVHJZB
> 	postaladdress: crObexKoUTIFdzNHcZMr
> 	employeenumber: 1
> 	userpassword:: cGFzc3dvcmQ=
> 
> I started loading a partition up with these entries
> 100,000 of them at a
> time then performing the following searches for all
> entries with
> initials aa:
> 
> (1) index on initials but no cached entries
> (2) index on initials with cached entries
> (3) no index without cached entries
> 
> Here are the results at the various capacities:
> 
> ---------------
> 100,000 Entries
> ---------------
> 
>      [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        3.30
> (2)    yes      yes        0.72
> (3)    no       no         30.63
> 
> search results = 153 entries
> 
> 
> ---------------
> 200,000 Entries
> ---------------
> 
>      [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        6.04
> (2)    yes      yes        1.44
> (3)    no       no         82
> 
> search results = 302 entries
> 
> 
> ---------------
> 300,000 Entries
> ---------------
> 
>      [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        7.54
> (2)    yes      yes        1.95
> (3)    no       no         146
> 
> search results = 451 entries
> 
> 
> ---------------
> 400,000 Entries
> ---------------
> 
>      [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        9.24
> (2)    yes      yes        3.80
> (3)    no       no         196
> 
> search results = 586 entries
> 
> 
> ---------------
> 500,000 Entries
> ---------------
> 
>      [cached] [indexed] [time (seconds)]
> 
> (1)    no       yes        11.96
> (2)    yes      yes        3.21
> (3)    no       no         224
> 
> search results = 748 entries
> 
> 
> Alex
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com