You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2009/04/23 12:32:13 UTC
Tokyo Cabinet as a BayesStore
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
BerkeleyDB's 340 (via python bindings), over 4 times faster. There's been
a lot of good press about it.... possibly a candidate for a future plugin?
--j.
Re: Tokyo Cabinet as a BayesStore
Posted by Justin Mason <jm...@jmason.org>.
On Thu, Apr 23, 2009 at 15:56, Matt Sergeant <ms...@messagelabs.com> wrote:
> On Thu, 23 Apr 2009 10:23:30 -0400, Matt Sergeant wrote:
>> On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
>>>
>>
> http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
>>>
>>> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
>>> BerkeleyDB's 340 (via python bindings), over 4 times faster. There's been
>>> a lot of good press about it.... possibly a candidate for a future plugin?
>>
>> The times look really bizarre to me.
>>
>> An in memory store can only do 2700 "tweets/sec" (whatever that
>> means)??? That's INCREDIBLY low.
>>
>> I suspect BerkelyDB there is at about as fast as you might get without
>> turning off fsync to disk. Tokyo Cabinet is probably faster because it
>> doesn't fsync. I imagine that's about all there is to it.
>>
>> Would love to be proven wrong though.
>
> Ah, Tokyo Cabinet is just the successor to QDBM. I wrote a CPAN module
> for that a while back :)
>
> I seem to recall I had some corruption issues with it under high load.
> But perhaps Tokyo Cabinet is better in that regard.
it's been receiving good reviews on the reliability/performance side
recently -- http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores
. so fingers crossed.
--j.
Re: Tokyo Cabinet as a BayesStore
Posted by Matt Sergeant <ms...@messagelabs.com>.
On Thu, 23 Apr 2009 10:23:30 -0400, Matt Sergeant wrote:
> On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
>>
>
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
>>
>> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
>> BerkeleyDB's 340 (via python bindings), over 4 times faster. There's been
>> a lot of good press about it.... possibly a candidate for a future plugin?
>
> The times look really bizarre to me.
>
> An in memory store can only do 2700 "tweets/sec" (whatever that
> means)??? That's INCREDIBLY low.
>
> I suspect BerkelyDB there is at about as fast as you might get without
> turning off fsync to disk. Tokyo Cabinet is probably faster because it
> doesn't fsync. I imagine that's about all there is to it.
>
> Would love to be proven wrong though.
Ah, Tokyo Cabinet is just the successor to QDBM. I wrote a CPAN module
for that a while back :)
I seem to recall I had some corruption issues with it under high load.
But perhaps Tokyo Cabinet is better in that regard.
Matt.
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
Re: Tokyo Cabinet as a BayesStore
Posted by Matt Sergeant <ms...@messagelabs.com>.
On Thu, 23 Apr 2009 10:32:13 +0000, Justin Mason wrote:
>
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/
>
> highlight: a Tokyo Cabinet hashtable performed at 1400 ops/sec compared to
> BerkeleyDB's 340 (via python bindings), over 4 times faster. There's been
> a lot of good press about it.... possibly a candidate for a future plugin?
The times look really bizarre to me.
An in memory store can only do 2700 "tweets/sec" (whatever that
means)??? That's INCREDIBLY low.
I suspect BerkelyDB there is at about as fast as you might get without
turning off fsync to disk. Tokyo Cabinet is probably faster because it
doesn't fsync. I imagine that's about all there is to it.
Would love to be proven wrong though.
Matt.
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________