You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paolo Cravero as2594 <pc...@as2594.net> on 2004/07/14 12:50:15 UTC

Re: Clustering spamassassin

Joshua Cornejo wrote:

> We're currently evaluating how to cluster spamassassin without having
> to have different heuristic databases (shared knowledge). I guess the
[...]
> it's not worth and is rather better to have independent machines with
> different databases and different knowledge caused by the differences
> in email processed by each server. Any views/urls ?

We chose not to depend on a shared filesystem for our hardware 
load-balanced instances of postfix+spamassassin in order to maintain 
high-availability of the inbound SMTP service.

If any of the instances crashes for whatever reason, others take over 
the whole traffic. If the disk runs out of space (because the AWL grows 
and grows and you gotta clean it up manually) only one instance gets 
corrupted AWL and Bayes DB files.

Shared knowledge is not a big issue since you can be statistically sure 
that on the long run all SA instances will see the same traffic.

OTOH training Bayes DB must be done on all instances, as well as 
searching through the MTA logs if some mail "disappears"...

It is possible to store Bayes on a MySQL database, but that would 
introduce another point of failure in our architecture, and we can't 
replicate that one too!

Hope my reply was not too far from what you expected,
Paolo