You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Larry Nedry <sp...@bluestreak.net> on 2008/05/27 23:22:00 UTC

MySQL and Size Of bayes_expiry_max_db_size

Greetings,

This weekend I created a MySQL db to store my bayes tokens.  It seems to be
working well but I'm a little puzzled by the default size of
bayes_expiry_max_db_size.  I understand that the default size is 150,000
which seems very low as it took only one day to reach 100,000 tokens.

Was the default size set that low because of the performance of the default db?
Is it reasonable to set it to a much higher number considering that I am
using a SQL db?

Thanks for any help!

Nedry

Re: MySQL and Size Of bayes_expiry_max_db_size

Posted by Kris Deugau <kd...@vianet.ca>.
Larry Nedry wrote:
> Of course.  But how would I figure out what works best?  How can I tell if
> it is working poorly or very well?

Results.  <g>  Customer/user complaints are always useful (if perhaps 
not really desireable);  customer/user *feedback* is critical on 
anything bigger than a trivial personal or very-small-business system. 
You have to feed in a variety of legitimate email - finding spam to feed 
in shouldn't be a problem.

> I'm looking for a way to calculate or experimentally find the sweet spot
> for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
> What happens if the size is too high?

I've found 600,000 works pretty well on a smallish filter server (about 
the same hardware class as your system, AKA "overkill" <g>);  for the 
larger cluster serving between high single-digit and low double-digit 
thousands of accounts, plus filtering outbound mail, I've been playing 
with various settings on and off for several months now.  I still 
haven't found a happy balance.

(Side note - This question in various forms has been asked 3 or 4 times 
in the past month or so - could someone who really knows the Bayes 
innards please speak up?  As noted near the beginning of this thread, 
the default number of tokens is too small for anything much bigger than 
purely personal/per-user Bayes.)

Benny Pedersen's reply a few messages back includes a few points that 
made my own experiments become a lot more coherent;  I'll be doing 
further tuning based on that.  At the moment, for my usage, I'm looking 
at ~2M tokens as a floor.

-kgd

Re: MySQL and Size Of bayes_expiry_max_db_size

Posted by Benny Pedersen <me...@junc.org>.
On Wed, May 28, 2008 00:04, Larry Nedry wrote:

> I'm looking for a way to calculate or experimentally find the sweet spot
> for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
> What happens if the size is too high?

what happen is when the size is to big the more ham/spam training needs to be
performed to have effect on bayes

the lower bayes size, faster learning, but olso a bit unstable

to get it:

1: if you want manual training keep sizes low
2: otherwize raise bayes size to be bigger to compensate for no manuel training

always monitor bayes anyway will spot if it works or not, for the bayes
autolearn one can make the range bigger to get more static laerning olso, so
if bayes updates takes lots of time pr msg, this is how to make it more
silence

most important is that bayes is doing it right eg only give bayes_99 for spam,
and bayes_00 for ham

last but not least make sure there is equal learned ham / spam signatures



Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098


Re: MySQL and Size Of bayes_expiry_max_db_size

Posted by Michael Monnerie <mi...@it-management.at>.
On Mittwoch, 28. Mai 2008 Larry Nedry wrote:
> But how would I figure out what works best?  How can I tell if
> it is working poorly or very well?

We use bayes_expiry_max_db_size 2123456 and bayes is absolutely correct 
for us. I think you cannot really calculate it, it depends on how many 
different spams/hams you get, so how many tokens you need it to be good 
enough. Over time we experienced with the value a bit, but more than 2 
million tokens doesn't help anymore for us: bayes is 100% correct now. 
We do make good training though.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0676/846 914 666                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net                   Key-ID: 1C1209B4

Re: MySQL and Size Of bayes_expiry_max_db_size

Posted by Larry Nedry <sp...@bluestreak.net>.
On 5/27/08 at 4:33 PM -0500 Michael Parker wrote:
>You should adjust it for whatever works best for your user base and
>the resources you have available on your database.

Of course.  But how would I figure out what works best?  How can I tell if
it is working poorly or very well?

I'm looking for a way to calculate or experimentally find the sweet spot
for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
What happens if the size is too high?

The server in question has a dual core processor with 2 GB of RAM.  There
are currently about 150 users on this box and growing.  SpamAssassin is
version 3.2.4.

Any suggestions?

Nedry

Re: MySQL and Size Of bayes_expiry_max_db_size

Posted by Michael Parker <pa...@pobox.com>.
On May 27, 2008, at 4:22 PM, Larry Nedry wrote:

> Greetings,
>
> This weekend I created a MySQL db to store my bayes tokens.  It  
> seems to be
> working well but I'm a little puzzled by the default size of
> bayes_expiry_max_db_size.  I understand that the default size is  
> 150,000
> which seems very low as it took only one day to reach 100,000 tokens.
>
> Was the default size set that low because of the performance of the  
> default db?
> Is it reasonable to set it to a much higher number considering that  
> I am
> using a SQL db?
>

You should adjust it for whatever works best for your user base and  
the resources you have available on your database.  The default value  
is best suited for single users so I wouldn't be surprised if it was  
too low.

Michael


> Thanks for any help!
>
> Nedry
>