You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@gmail.com> on 2013/11/11 10:01:22 UTC

Mavibot vs JDBM results

Hi,

now that the mavibot partition is working, here a quick bench I ran this
morning :

Addition of 10 000 entries, then 400 000 random searches :

JDBM   
10000 : Add = 74/s, Search : 4 729/s

Mavibot
10000 : Add = 120/s, Search : 11 056/s

As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
for searches... Will run a test with 100 000 entries.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Mavibot vs JDBM results

Posted by Emmanuel Lécharny <el...@gmail.com>.

Le 11/11/13 9:49 PM, Alex Karasulu a écrit :
> On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu <hy...@symas.com> wrote:
>
>> Alex Karasulu wrote:
>>
>>>     Now, this is an approach where we used plain Keys (ie, Keys can have
>>>     various sizes, which is not really efficient, as we may have to
>>>     allocate more pages than necessary to store nodes and leaves.
>>>     Open LDAP uses another approach, which is smarter : they use the hash
>>>     value of each key to retrieve the element. Obviously, this leads to
>>>     compare the keys when we reach the leaf, as we may have more than one
>>>     key with the same hash value, and it also destroys the ordering (one
>>>     can't compare two hash values as the ordering will be different) but
>>>     most of the case, it's not really a big deal.
>>>     The main advantage of such an approach is that suddenly, Nodes have a
>>>     fixed size (a hash can be stored as an int, and the references to a
>>>     page are longs), so in a fixed page size, we can store a fixed number
>>>     of elements. Assuming that a node needs at least 28 bytes to store its
>>>     header and PageIO, in a 512 bytes page we can store (512 - 28) /
>>>     ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>>>     and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>>>     Atm, we store 16 element per node, which requires many physical pages,
>>>     ie, many disk access.
>>>
>>>     This is something that worth being investigated in the near future.
>>>
>>>
>>> Sounds like we need a minimal perfect order preserving hash function.
>>>
>> These are a pain to try to use for this purpose. All of the perfect order
>> preserving hashes I've found only work because you generate the hash
>> function based on knowing the entire set of data in advance. When you add
>> new records you need to create a new hash function, and thus the hash
>> values of every key changed.
>>
>> I.e., these are only useful for static data sets.
>>
> That's lame ... no silver bullets I guess.
>
Well, still : we rarely need ordered index. Only when some filters with
<= or >= those ordered index are good to have. One good tradeoff might
be to create special indexes in those special cases (ie, not always).
Most of the time, we will be good with standard index with hashed keys.

We just need to ask for those specific index to be created in the
configuration.

That is at least something to consider.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Mavibot vs JDBM results

Posted by Alex Karasulu <ak...@apache.org>.

On Mon, Nov 11, 2013 at 10:46 PM, Howard Chu <hy...@symas.com> wrote:

> Alex Karasulu wrote:
>
>>     Now, this is an approach where we used plain Keys (ie, Keys can have
>>     various sizes, which is not really efficient, as we may have to
>>     allocate more pages than necessary to store nodes and leaves.
>>     Open LDAP uses another approach, which is smarter : they use the hash
>>     value of each key to retrieve the element. Obviously, this leads to
>>     compare the keys when we reach the leaf, as we may have more than one
>>     key with the same hash value, and it also destroys the ordering (one
>>     can't compare two hash values as the ordering will be different) but
>>     most of the case, it's not really a big deal.
>>     The main advantage of such an approach is that suddenly, Nodes have a
>>     fixed size (a hash can be stored as an int, and the references to a
>>     page are longs), so in a fixed page size, we can store a fixed number
>>     of elements. Assuming that a node needs at least 28 bytes to store its
>>     header and PageIO, in a 512 bytes page we can store (512 - 28) /
>>     ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>>     and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>>     Atm, we store 16 element per node, which requires many physical pages,
>>     ie, many disk access.
>>
>>     This is something that worth being investigated in the near future.
>>
>>
>> Sounds like we need a minimal perfect order preserving hash function.
>>
>
> These are a pain to try to use for this purpose. All of the perfect order
> preserving hashes I've found only work because you generate the hash
> function based on knowing the entire set of data in advance. When you add
> new records you need to create a new hash function, and thus the hash
> values of every key changed.
>
> I.e., these are only useful for static data sets.
>

That's lame ... no silver bullets I guess.

-- 
Best Regards,
-- Alex

Re: Mavibot vs JDBM results

Posted by Howard Chu <hy...@symas.com>.

Alex Karasulu wrote:
>     Now, this is an approach where we used plain Keys (ie, Keys can have
>     various sizes, which is not really efficient, as we may have to
>     allocate more pages than necessary to store nodes and leaves.
>     Open LDAP uses another approach, which is smarter : they use the hash
>     value of each key to retrieve the element. Obviously, this leads to
>     compare the keys when we reach the leaf, as we may have more than one
>     key with the same hash value, and it also destroys the ordering (one
>     can't compare two hash values as the ordering will be different) but
>     most of the case, it's not really a big deal.
>     The main advantage of such an approach is that suddenly, Nodes have a
>     fixed size (a hash can be stored as an int, and the references to a
>     page are longs), so in a fixed page size, we can store a fixed number
>     of elements. Assuming that a node needs at least 28 bytes to store its
>     header and PageIO, in a 512 bytes page we can store (512 - 28) /
>     ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
>     and 17 values (272 bytes). We hve 148 bytes remaining in this case.
>     Atm, we store 16 element per node, which requires many physical pages,
>     ie, many disk access.
>
>     This is something that worth being investigated in the near future.
>
>
> Sounds like we need a minimal perfect order preserving hash function.

These are a pain to try to use for this purpose. All of the perfect order 
preserving hashes I've found only work because you generate the hash function 
based on knowing the entire set of data in advance. When you add new records 
you need to create a new hash function, and thus the hash values of every key 
changed.

I.e., these are only useful for static data sets.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Mavibot vs JDBM results

Posted by Alex Karasulu <ak...@apache.org>.

Hi Emmanuel,

On Mon, Nov 11, 2013 at 4:09 PM, Emmanuel Lecharny <el...@apache.org>wrote:

> On Mon, Nov 11, 2013 at 2:31 PM, Alex Karasulu <ak...@apache.org>
> wrote:
> >
> > On Mon, Nov 11, 2013 at 11:01 AM, Emmanuel Lécharny <elecharny@gmail.com
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> now that the mavibot partition is working, here a quick bench I ran this
> >> morning :
> >>
> >> Addition of 10 000 entries, then 400 000 random searches :
> >>
> >> JDBM
> >> 10000 : Add = 74/s, Search : 4 729/s
> >>
> >> Mavibot
> >> 10000 : Add = 120/s, Search : 11 056/s
> >>
> >> As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
> >> for searches... Will run a test with 100 000 entries.
> >
> >
> > Is this benchmark involving the entire network ADS stack?
>
> Yes, except the network layer (which is irrelevant in this test). Also
> note that they are run on my poor laptop.
>
>
That's really cool. If I remember correctly you have not bought a new Mac
in a while. Now you have to keep this machine as the reference
configuration for all historic metrics comparisons :).


> The last results I get are the following  :
>
>
> JDBM (1Gb)
> 1000 : Add = 56/s, Search = 14 845/s
> Mavibot (1Gb)
> 1000 : Add = 111/s, Search = 17 586/s
>
> JDBM (1Gb)
> 10000 : Add = 57/s, Search = 4 729/s
> Mavibot (1Gb)
> 10000 : Add = 120/s, Search = 11 056/s
>
> JDBM (2Gb)
> 50000 : Add = 51/s, Search = 3 515/s
> Mavibot (2Gb)
> 50000 : Add = 134/s, Search = 10335/s
>
>
Impressive! These are by far the best numbers we've seen in all of ADS
history.


>
> Note that if we hit the disk (ie, teh cache and memory is not big
> enough), then the performances get down immediately :
>
>
> JDBM (2Gb)
> 100000 : Add = 44/s, Search = 2 957/s
> Mavibot (2Gb)
> 100000 : Add = 100/s, Search = 3 308/s
>
>
> This is even more visible for Mavibot than for JDBM, most certainly
> due to the cache we are using in Mavibot (EhCach) which is probably
> overkilling compared to the very basic but efficient LRU used by JDBM.
>
>
Yeah JDBM's cache was uber simple, perhaps a similar KISS cache maybe right
for Mavibot but maybe tunable to various common access scenarios or even
one that is adaptable.


>
> Enough said that given enough memory, Mavibot in its current state (i.e.
> we are still using locks all over, as the revisions are not yet
> implemented) is already more than 2x faster for additions and 3x
> faster for searches...
>
> This is not the end of the story though. There are many possible
> optimizations in Mavibot :
> - first of all, remove the locks that block concurrent access in
> searches (but that requires the handling of revisions in Mavibot,
> which is just a matter of implementing the free page collection)
> - second, we are doing way too many findPos (probably two times more
> than needed). We can get rid of this.
>
>
Looking forward to seeing those stats when these changes take place. I'd
love to see us at least come close to the C-based servers out there.


>
> Now, this is an approach where we used plain Keys (ie, Keys can have
> various sizes, which is not really efficient, as we may have to
> allocate more pages than necessary to store nodes and leaves.
> Open LDAP uses another approach, which is smarter : they use the hash
> value of each key to retrieve the element. Obviously, this leads to
> compare the keys when we reach the leaf, as we may have more than one
> key with the same hash value, and it also destroys the ordering (one
> can't compare two hash values as the ordering will be different) but
> most of the case, it's not really a big deal.
> The main advantage of such an approach is that suddenly, Nodes have a
> fixed size (a hash can be stored as an int, and the references to a
> page are longs), so in a fixed page size, we can store a fixed number
> of elements. Assuming that a node needs at least 28 bytes to store its
> header and PageIO, in a 512 bytes page we can store (512 - 28) /
> ((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
> and 17 values (272 bytes). We hve 148 bytes remaining in this case.
> Atm, we store 16 element per node, which requires many physical pages,
> ie, many disk access.
>
> This is something that worth being investigated in the near future.
>
>
Sounds like we need a minimal perfect order preserving hash function.

-- 
Best Regards,
-- Alex

Re: Mavibot vs JDBM results

Posted by Emmanuel Lecharny <el...@apache.org>.

On Mon, Nov 11, 2013 at 2:31 PM, Alex Karasulu <ak...@apache.org> wrote:
>
> On Mon, Nov 11, 2013 at 11:01 AM, Emmanuel Lécharny <el...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> now that the mavibot partition is working, here a quick bench I ran this
>> morning :
>>
>> Addition of 10 000 entries, then 400 000 random searches :
>>
>> JDBM
>> 10000 : Add = 74/s, Search : 4 729/s
>>
>> Mavibot
>> 10000 : Add = 120/s, Search : 11 056/s
>>
>> As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
>> for searches... Will run a test with 100 000 entries.
>
>
> Is this benchmark involving the entire network ADS stack?

Yes, except the network layer (which is irrelevant in this test). Also
note that they are run on my poor laptop.

The last results I get are the following  :

JDBM (1Gb)
1000 : Add = 56/s, Search = 14 845/s
Mavibot (1Gb)
1000 : Add = 111/s, Search = 17 586/s

JDBM (1Gb)
10000 : Add = 57/s, Search = 4 729/s
Mavibot (1Gb)
10000 : Add = 120/s, Search = 11 056/s

JDBM (2Gb)
50000 : Add = 51/s, Search = 3 515/s
Mavibot (2Gb)
50000 : Add = 134/s, Search = 10335/s

Note that if we hit the disk (ie, teh cache and memory is not big
enough), then the performances get down immediately :

JDBM (2Gb)
100000 : Add = 44/s, Search = 2 957/s
Mavibot (2Gb)
100000 : Add = 100/s, Search = 3 308/s

This is even more visible for Mavibot than for JDBM, most certainly
due to the cache we are using in Mavibot (EhCach) which is probably
overkilling compared to the very basic but efficient LRU used by JDBM.

Enough said that given enough memory, Mavibot in its current state (ie
we are still using locks all over, as the revisions are not yet
implemented) is already more than 2x faster for additions and 3x
faster for searches...

This is not the end of the story though. There are many possible
optimizations in Mavibot :
- first of all, remove the locks that block concurrent access in
searches (but that requires the handling of revisions in Mavibot,
which is just a matter of implementing the free page collection)
- second, we are doing way too many findPos (probably two times more
than needed). We can get rid of this.

Now, this is an approach where we used plain Keys (ie, Keys can have
various sizes, which is not really efficient, as we may have to
allocate more pages than necessary to store nodes and leaves.
Open LDAP uses another approach, which is smarter : they use the hash
value of each key to retrieve the element. Obviously, this leads to
compare the keys when we reach the leaf, as we may have more than one
key with the same hash value, and it also destroys the ordering (one
can't compare two hash values as the ordering will be different) but
most of the case, it's not really a big deal.
The main advantage of such an approach is that suddenly, Nodes have a
fixed size (a hash can be stored as an int, and the references to a
page are longs), so in a fixed page size, we can store a fixed number
of elements. Assuming that a node needs at least 28 bytes to store its
header and PageIO, in a 512 bytes page we can store (512 - 28) /
((nbValues+1) x (8+8) + nbKeys x 4 ) elements, so 16 keys (64 bytes)
and 17 values (272 bytes). We hve 148 bytes remaining in this case.
Atm, we store 16 element per node, which requires many physical pages,
ie, many disk access.

This is something that worth being investigated in the near future.

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com

Re: Mavibot vs JDBM results

Posted by Alex Karasulu <ak...@apache.org>.

On Mon, Nov 11, 2013 at 11:01 AM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Hi,
>
> now that the mavibot partition is working, here a quick bench I ran this
> morning :
>
> Addition of 10 000 entries, then 400 000 random searches :
>
> JDBM
> 10000 : Add = 74/s, Search : 4 729/s
>
> Mavibot
> 10000 : Add = 120/s, Search : 11 056/s
>
> As we can see, Mavibot is 2,1 x faster for additions, and 2,33 x faster
> for searches... Will run a test with 100 000 entries.

Is this benchmark involving the entire network ADS stack?

This is definitely something to be very proud of. Kudos!

-- 
Best Regards,
-- Alex