You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2009/04/23 12:32:13 UTC

Tokyo Cabinet as a BayesStore

http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/

highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
BerkeleyDB's 340 (via python bindings), over 4 times faster.  There's been
a lot of good press about it.... possibly a candidate for a future plugin?

--j.

Re: Tokyo Cabinet as a BayesStore

Posted by Justin Mason <jm...@jmason.org>.
On Thu, Apr 23, 2009 at 15:56, Matt Sergeant <ms...@messagelabs.com> wrote:
> On Thu, 23 Apr 2009 10:23:30 -0400, Matt Sergeant wrote:
>> On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
>>>
>>
> http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
>>>
>>> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
>>> BerkeleyDB's 340 (via python bindings), over 4 times faster.  There's been
>>> a lot of good press about it.... possibly a candidate for a future plugin?
>>
>> The times look really bizarre to me.
>>
>> An in memory store can only do 2700 "tweets/sec" (whatever that
>> means)??? That's INCREDIBLY low.
>>
>> I suspect BerkelyDB there is at about as fast as you might get without
>> turning off fsync to disk. Tokyo Cabinet is probably faster because it
>> doesn't fsync. I imagine that's about all there is to it.
>>
>> Would love to be proven wrong though.
>
> Ah, Tokyo Cabinet is just the successor to QDBM. I wrote a CPAN module
> for that a while back :)
>
> I seem to recall I had some corruption issues with it under high load.
> But perhaps Tokyo Cabinet is better in that regard.

it's been receiving good reviews on the reliability/performance side
recently -- http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores
.  so fingers crossed.

--j.

Re: Tokyo Cabinet as a BayesStore

Posted by Matt Sergeant <ms...@messagelabs.com>.
On Thu, 23 Apr 2009 10:23:30 -0400, Matt Sergeant wrote:
> On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
>> 
> 
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
>> 
>> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
>> BerkeleyDB's 340 (via python bindings), over 4 times faster.  There's been
>> a lot of good press about it.... possibly a candidate for a future plugin?
> 
> The times look really bizarre to me.
> 
> An in memory store can only do 2700 "tweets/sec" (whatever that 
> means)??? That's INCREDIBLY low.
> 
> I suspect BerkelyDB there is at about as fast as you might get without 
> turning off fsync to disk. Tokyo Cabinet is probably faster because it 
> doesn't fsync. I imagine that's about all there is to it.
> 
> Would love to be proven wrong though.

Ah, Tokyo Cabinet is just the successor to QDBM. I wrote a CPAN module 
for that a while back :)

I seem to recall I had some corruption issues with it under high load. 
But perhaps Tokyo Cabinet is better in that regard.

Matt.

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Re: Tokyo Cabinet as a BayesStore

Posted by Matt Sergeant <ms...@messagelabs.com>.
On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
> 
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
> 
> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
> BerkeleyDB's 340 (via python bindings), over 4 times faster.  There's been
> a lot of good press about it.... possibly a candidate for a future plugin?

The times look really bizarre to me.

An in memory store can only do 2700 "tweets/sec" (whatever that 
means)??? That's INCREDIBLY low.

I suspect BerkelyDB there is at about as fast as you might get without 
turning off fsync to disk. Tokyo Cabinet is probably faster because it 
doesn't fsync. I imagine that's about all there is to it.

Would love to be proven wrong though.

Matt.

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________