You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by Emmanuel Lécharny <el...@symas.com> on 2013/09/10 17:40:49 UTC

Mavibot cache experiment

Hi guys,

yestrday and today, I was implementing a cache in Mavibot to replace the
WeakReferences we were using previously. The rational was that with the
WeakReferences, we were unable to inject more than 50 000 entries in the
server (at this point, the GC is just going crazy trying to free some
WeakReferences, and it slows down the server to a pint it's unusable...)

So after a day fighting with an EhCache implementation in Mavibot, I was
able to load 100K entries in the server. So far, so good, except that
the performances are anything but good. I can add 26 entries per second,
and fetch 555 entries per second. Worse than JDBM...

Why is it so slow, especially for a search operation ? There are two
reasons :
- first, I configured the cache to store only 1000 elements (mainly nodes)
- second, when we try to update a leaf, we mostly have to load it from
disk, as we rarely have it in memory
- third, a leaf contains from 8 to 16 entries, and everytime we fetch a
leaf from disk, we have to deserialize the Entries, which is extremely
costly

Fixing this third problem would save us a lot of time, and it's a matter
of adding one level of indirection (the entries would be kept as byte[],
and deserialized only when needed).

If anyone has a better idea...

Mavibot cache experiment, some results

Posted by Emmanuel Lécharny <el...@gmail.com>.
Hi guys,

I have run some performance tests with Mavibot and cache, and the
numbers are not quite good, but at least, I know why.

First, if I use 1000 as a cache size, and inject 10 000 entries, I can
get around 13 000 search/s, pretty similar to what I get with the
previous mavinot version (which uses weak references). That's good.

Second, I can now add 100 000 entries into the server, but it takes 1h
more or less.

Third, searching in the 100 000 entries base takes forever, ie I can't
get more than 700 search/s, which is 20 times slower than when the cache
is big enough to keep all the entries in memory.

Now, I profiled the server with a small cache and I found that we spend
60% of the time deserializing entries. All in all, for one single
search, we deserialize 12 entries. The reson is that when we have to
read the data from disk - something likely to happen when the cache is
small - then we pay the cost of reading the data from the disk, plus an
extra penalty for deserializing *all* the entries stored in the leaf we
just read from disk (around 12 entries per leaf, if the leaf contains 16
available slots).

There is plenty of room for imtpovement here...


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Re: Mavibot cache experiment

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 9/11/13 4:44 AM, Kiran Ayyagari a écrit :
> On Tue, Sep 10, 2013 at 9:35 PM, Emmanuel Lécharny <el...@gmail.com>wrote:
>
>> Le 9/10/13 5:40 PM, Emmanuel Lécharny a écrit :
>>> Hi guys,
>>>
>>> yestrday and today, I was implementing a cache in Mavibot to replace the
>>> WeakReferences we were using previously. The rational was that with the
>>> WeakReferences, we were unable to inject more than 50 000 entries in the
>>> server (at this point, the GC is just going crazy trying to free some
>>> WeakReferences, and it slows down the server to a pint it's unusable...)
>>>
>>> So after a day fighting with an EhCache implementation in Mavibot, I was
>>> able to load 100K entries in the server. So far, so good, except that
>>> the performances are anything but good. I can add 26 entries per second,
>>> and fetch 555 entries per second. Worse than JDBM...
>>>
>>> Why is it so slow, especially for a search operation ? There are two
>>> reasons :
>>> - first, I configured the cache to store only 1000 elements (mainly
>> nodes)
>>> - second, when we try to update a leaf, we mostly have to load it from
>>> disk, as we rarely have it in memory
>>> - third, a leaf contains from 8 to 16 entries, and everytime we fetch a
>>> leaf from disk, we have to deserialize the Entries, which is extremely
>>> costly
>>>
>>> Fixing this third problem would save us a lot of time, and it's a matter
>>> of adding one level of indirection (the entries would be kept as byte[],
>>> and deserialized only when needed).
>>>
>>> If anyone has a better idea...
>> indeed : the pb is that we serialize/deserialize too late. It would be
>>
> I think you mean 'too _early_'

No, we should serialize the entry (or the values in general) before we
store it into the Leaf. Leaves should only store binary values.

That has 2 consequences :
- we should never compare 2 values
- we should add a cache to avoid fetching a value from the btree for
better performances

In other words, the Managed BTree cache will cache binary values, not
plain java objects.

Fetching an element from a BTree will then following this algorithm :
- first check if the value is present in the cache using its key
- if not, fetch the value from the BTree. If the leaf containing this
entry is in cache, deserialize the value and return it.


-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com 


Re: Mavibot cache experiment

Posted by Kiran Ayyagari <ka...@apache.org>.
On Tue, Sep 10, 2013 at 9:35 PM, Emmanuel Lécharny <el...@gmail.com>wrote:

> Le 9/10/13 5:40 PM, Emmanuel Lécharny a écrit :
> > Hi guys,
> >
> > yestrday and today, I was implementing a cache in Mavibot to replace the
> > WeakReferences we were using previously. The rational was that with the
> > WeakReferences, we were unable to inject more than 50 000 entries in the
> > server (at this point, the GC is just going crazy trying to free some
> > WeakReferences, and it slows down the server to a pint it's unusable...)
> >
> > So after a day fighting with an EhCache implementation in Mavibot, I was
> > able to load 100K entries in the server. So far, so good, except that
> > the performances are anything but good. I can add 26 entries per second,
> > and fetch 555 entries per second. Worse than JDBM...
> >
> > Why is it so slow, especially for a search operation ? There are two
> > reasons :
> > - first, I configured the cache to store only 1000 elements (mainly
> nodes)
> > - second, when we try to update a leaf, we mostly have to load it from
> > disk, as we rarely have it in memory
> > - third, a leaf contains from 8 to 16 entries, and everytime we fetch a
> > leaf from disk, we have to deserialize the Entries, which is extremely
> > costly
> >
> > Fixing this third problem would save us a lot of time, and it's a matter
> > of adding one level of indirection (the entries would be kept as byte[],
> > and deserialized only when needed).
> >
> > If anyone has a better idea...
> indeed : the pb is that we serialize/deserialize too late. It would be
>
I think you mean 'too _early_'

> way better if we process serialized values until the point we return the
> result to the user :
>
> - addition : we receive an object, we immediately serialize it into a
> byte[] and process the addition using this byte[]. We don't anymore have
> to deserialize all the values from the page we will add the new value,
> they are all byte[]
> - search : same thing, we don't deserialize the values until we return
> it to the user.
>
> +1

> The gain will be huge !
>
> --
> Regards,
> Cordialement,
> Emmanuel Lécharny
> www.iktek.com
>
>


-- 
Kiran Ayyagari
http://keydap.com

Re: Mavibot cache experiment

Posted by Emmanuel Lécharny <el...@gmail.com>.
Le 9/10/13 5:40 PM, Emmanuel Lécharny a écrit :
> Hi guys,
>
> yestrday and today, I was implementing a cache in Mavibot to replace the
> WeakReferences we were using previously. The rational was that with the
> WeakReferences, we were unable to inject more than 50 000 entries in the
> server (at this point, the GC is just going crazy trying to free some
> WeakReferences, and it slows down the server to a pint it's unusable...)
>
> So after a day fighting with an EhCache implementation in Mavibot, I was
> able to load 100K entries in the server. So far, so good, except that
> the performances are anything but good. I can add 26 entries per second,
> and fetch 555 entries per second. Worse than JDBM...
>
> Why is it so slow, especially for a search operation ? There are two
> reasons :
> - first, I configured the cache to store only 1000 elements (mainly nodes)
> - second, when we try to update a leaf, we mostly have to load it from
> disk, as we rarely have it in memory
> - third, a leaf contains from 8 to 16 entries, and everytime we fetch a
> leaf from disk, we have to deserialize the Entries, which is extremely
> costly
>
> Fixing this third problem would save us a lot of time, and it's a matter
> of adding one level of indirection (the entries would be kept as byte[],
> and deserialized only when needed).
>
> If anyone has a better idea...
indeed : the pb is that we serialize/deserialize too late. It would be
way better if we process serialized values until the point we return the
result to the user :

- addition : we receive an object, we immediately serialize it into a
byte[] and process the addition using this byte[]. We don't anymore have
to deserialize all the values from the page we will add the new value,
they are all byte[]
- search : same thing, we don't deserialize the values until we return
it to the user.

The gain will be huge !

-- 
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com