You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Lars Ringh <la...@bahnhof.net> on 2006/04/03 14:34:29 UTC

Moving bayes from bdb to MySQL

I'm about to move my bayes and auto-whitelist data from local db-files 
on each server to a common MySQL-db.

I have 2+2 load balanced servers scanning mail using amavisd-new for 
different kinds of customers, home and corporate users repectively, and 
I was planning to keep their respective data  in two separate db's since 
they seem to be quite different.

Now, since in each case the source data can come from two different 
servers scanning the same kind of mails, should I try to merge the 
bayes-data from servers home1 and home2 into the the same myqsl-db and 
then merge the data from corp1 and corp2 into the other mysql-db, or 
should I pick my starting sourcedata from only one server in each pair? 
Would spamassassin benefit from having the greater source to look at, or 
would I only be adding close-to-identical data which would then only be 
expired faster than it was to merge them?

And out of curiosity, the "home servers" have about 165MB och bayes-data 
and 335MB in auto-whitelist, while the "corporate servers" have it the 
other way around, 335MB in bayes-db and 165MB in auto-whitelist. Could 
anyone enlight me briefly on why? Is it as simple as that the 
"home-servers" has fewer senders/recipients, but more different emails, 
and the "corporate-servers" has more senders/recipients but fewer 
different e-mails, or what?

//maccall

-- 

lars-dot-ringh-at-bahnhof-dot-net

Re: Moving bayes from bdb to MySQL

Posted by Lars Ringh <la...@bahnhof.net>.
Michael Monnerie wrote:
> On Montag, 3. April 2006 14:34 Lars Ringh wrote:
> 
>>Now, since in each case the source data can come from two different
>>servers scanning the same kind of mails, should I try to merge the
>>bayes-data from servers home1 and home2 into the the same myqsl-db
>>and then merge the data from corp1 and corp2 into the other mysql-db,
>>or should I pick my starting sourcedata from only one server in each
>>pair? Would spamassassin benefit from having the greater source to
>>look at, or would I only be adding close-to-identical data which
>>would then only be expired faster than it was to merge them?
> 
> 
> I believe you should *not* mix two different bayes DBs. Use just one, 
> and the rest will fill up with the next SPAM jumping in...


Yes, I've done some more thinking myself and this must be the only 
reasonable approach.


>>165MB...335MB
> 
> 
> Did you not bayes_auto_expire?


I was under the impression that i did, but since I've done some import 
of the data into mysql-dbs (where I am able to examine the data easier 
than when they are in bdb-files) I must say that I don't seem to...

A bit strange though, since the files reach this size from scratch in 
quite a short time, and then the file sizes stays at this size, that is 
they don't grow bigger than this. That's why I thought auto expire did 
it's work... One might suspect that I've given bayes_expiry_max_db_size 
some really odd value but that's not the case either...

Well, anyway, thanks for your input.

//maccall

-- 

lars-dot-ringh-at-bahnhof-dot-net

Re: Moving bayes from bdb to MySQL

Posted by Michael Monnerie <m....@zmi.at>.
On Montag, 3. April 2006 14:34 Lars Ringh wrote:
> Now, since in each case the source data can come from two different
> servers scanning the same kind of mails, should I try to merge the
> bayes-data from servers home1 and home2 into the the same myqsl-db
> and then merge the data from corp1 and corp2 into the other mysql-db,
> or should I pick my starting sourcedata from only one server in each
> pair? Would spamassassin benefit from having the greater source to
> look at, or would I only be adding close-to-identical data which
> would then only be expired faster than it was to merge them?

I believe you should *not* mix two different bayes DBs. Use just one, 
and the rest will fill up with the next SPAM jumping in...

> 165MB...335MB

Did you not bayes_auto_expire?

mf gzmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at           Tel: 0660/4156531          Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net                 Key-ID: 0x70545879