You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex Handle <al...@funky.at> on 2006/12/08 21:44:04 UTC

bayes db site wide or per user

Hi to all,

a month a go we implemented a mailcluster based on
postfix/mysql/nfs/amavisd-new/spamassassin and now we
would like to add bayesian filtering to the system.
Our Cluster is designed to scale for about 100 000 mailboxes.

The users should forward spam and ham to sa-learn by
sending the mails as attachment to a specific address:

user+spam372d38d2839@example.com

or

user+ham348383d48484@example.com


Is it a bad idea to use a site wide bayes database or is it better
to use a per user database in this scenario?
How resistent is a site wide setup with a lot of mailboxes against
poisoning?

Thanks!

Alex

Re: bayes db site wide or per user

Posted by Theo Van Dinter <fe...@apache.org>.
On Sat, Dec 09, 2006 at 01:48:51PM +0100, Alex Handle wrote:
> I could disable the spamchecks in amavisd-new and invoke sa through
> maildrop.
> But i don't know if a per-user database would scale for 100,000 mailboxes?

IMO, Bayes will likely be ok if you use SQL (though your DB will be quite
a bit larger).  I think the issue is going to be CPU -- more expires,
scanning mail delivered to multiple people multiple times, etc.

Generally speaking I believe, large user installations go site-wide.

-- 
Randomly Selected Tagline:
Leela: "He's crude and gross and he treats me like a slave." 
 Fry: "Then dump his one-eyed ass." 

Re: bayes db site wide or per user

Posted by Alex Handle <al...@funky.at>.
Theo Van Dinter schrieb:
> On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote:
>> postfix/mysql/nfs/amavisd-new/spamassassin and now we
>>
>> Is it a bad idea to use a site wide bayes database or is it better
>> to use a per user database in this scenario?
> 
> Per user DBs will give you better results, but since you're running from
> the MTA, your only choice is site-wide.
> 

I could disable the spamchecks in amavisd-new and invoke sa through
maildrop.
But i don't know if a per-user database would scale for 100,000 mailboxes?



Re: bayes db site wide or per user

Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, Dec 08, 2006 at 09:44:04PM +0100, Alex Handle wrote:
> postfix/mysql/nfs/amavisd-new/spamassassin and now we
> 
> Is it a bad idea to use a site wide bayes database or is it better
> to use a per user database in this scenario?

Per user DBs will give you better results, but since you're running from
the MTA, your only choice is site-wide.

-- 
Randomly Selected Tagline:
"Wheee! ...ow, I bit my tongue!"
 
 	--Ralph Wiggum
 	  Bart's Inner Child (Episode 1F05)