You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Paolo Cravero as2594 <pc...@as2594.net> on 2004/07/14 12:50:15 UTC
Re: Clustering spamassassin
Joshua Cornejo wrote:
> We're currently evaluating how to cluster spamassassin without having
> to have different heuristic databases (shared knowledge). I guess the
[...]
> it's not worth and is rather better to have independent machines with
> different databases and different knowledge caused by the differences
> in email processed by each server. Any views/urls ?
We chose not to depend on a shared filesystem for our hardware
load-balanced instances of postfix+spamassassin in order to maintain
high-availability of the inbound SMTP service.
If any of the instances crashes for whatever reason, others take over
the whole traffic. If the disk runs out of space (because the AWL grows
and grows and you gotta clean it up manually) only one instance gets
corrupted AWL and Bayes DB files.
Shared knowledge is not a big issue since you can be statistically sure
that on the long run all SA instances will see the same traffic.
OTOH training Bayes DB must be done on all instances, as well as
searching through the MTA logs if some mail "disappears"...
It is possible to store Bayes on a MySQL database, but that would
introduce another point of failure in our architecture, and we can't
replicate that one too!
Hope my reply was not too far from what you expected,
Paolo