You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Lukas Garberg <lu...@spritelink.net> on 2007/10/28 22:51:51 UTC

Synchronize bayes databases

Dear list,

I'm developing a spam filter solution where we'll distribute the load
between a number of machines running SpamAssassin (together with
MailScanner and postfix).
We do currently use the bayes self learning feature, and would like to
do so in the future as well.

However, since the machines get different sets of mail fed to them,
their bayes databases will differ quite a bit, and it would be great if
all the self-learned tokens from all servers get distributed to all the
others, as well as the manual learning.

Which is the preferred way to synchronize the databases between
the servers?

I did consider the alternative to let all the servers use a common
database server, and use the bayes SQL storage module but I'd like to
avoid the single point of failure that solution comes with.

To make all the servers member of a MySQL cluster is an alternative,
but I'd like to avoid that as well to keep the complexity of the system
low.

Is it possible to simply sum the token counters from each of the servers
to merge the databases?

Thank you in advance,
Lukas Garberg

Re: Synchronize bayes databases

Posted by sa...@nationalnet.com.
On Sun, 28 Oct 2007 22:51:51 +0100
Lukas Garberg <lu...@spritelink.net> wrote:

> Dear list,
> 
> I'm developing a spam filter solution where we'll distribute the load
> between a number of machines running SpamAssassin (together with
> MailScanner and postfix).
> We do currently use the bayes self learning feature, and would like to
> do so in the future as well.
> 
> However, since the machines get different sets of mail fed to them,
> their bayes databases will differ quite a bit, and it would be great if
> all the self-learned tokens from all servers get distributed to all the
> others, as well as the manual learning.
> 
> Which is the preferred way to synchronize the databases between
> the servers?

I would recommend using the SQL plugin to have a global database shared between your machines.  That would eliminate the need to synchronize the bayes between them.
> 
> I did consider the alternative to let all the servers use a common
> database server, and use the bayes SQL storage module but I'd like to
> avoid the single point of failure that solution comes with.

You could always use MySQL replication.  You can find several how-tos on setting up master-master replication on google.
> 
> To make all the servers member of a MySQL cluster is an alternative,
> but I'd like to avoid that as well to keep the complexity of the system
> low.
> Is it possible to simply sum the token counters from each of the servers
> to merge the databases?
> 
> Thank you in advance,
> Lukas Garberg

Thanks,
Majied Najjar