You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Michael Scheidell <sc...@secnap.net> on 2009/12/15 17:55:58 UTC

Re: Site-wide Bayes

On 12/15/09 11:49 AM, Charles Gregory wrote:
> On Tue, 15 Dec 2009, Matt Garretson wrote:
>> Heartily agreed. Site-wide bayes here (single database for 2000+ 
>> users) catches 40% of the spam here.
>
> But what is the FP rate? Is it safe for an ISP with a widely varied 
> user base to use site-wide Bayes?
>
I find that you should reduce scores on the high and low end (bayes_00 
and bayes_95) and the 'meta rules' that might combine them also.

(so, yes, an ISP, or for our hosted clients, we have modified the bayes 
scores. .  if one client is a plastic surgeon, one is a stock broker, 
and one is a mortgage broker, each will be getting wildly different ham)

setting up a 'per domain' bayes might work, might be tricky, especially 
if an inbound email is going to several domains, and only if you are 
doing B2B (commercial clients)

-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008

_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_________________________________________________________________________

Re: Site-wide Bayes

Posted by Thomas Harold <th...@nybeta.com>.

On 12/17/2009 10:30 AM, RW wrote:
> On Wed, 16 Dec 2009 09:36:12 -0500
> Michael Scheidell<sc...@secnap.net>  wrote:
>
>> On 12/16/09 9:27 AM, Thomas Harold wrote:
>>> I'm guessing that you'd also want to change the autolearn
>>> thresholds to be stricter?  Like only auto-learning if it scores
>>> below -2 or above +10?
>>>
>>> (That might be an amavisd-new feature.)
>> I still use 0, but have the high score at +15.
>
> The default is 0.1 IIRC, and I wouldn't recommend setting it lower
> without negative-scoring custom rules - it's set positive for good
> reasons.
>
> BAYES and "userconf" whitelisting rules don't count for autolearning, so
> if you set a negative threshold with the default rules, you rely on
> DNS whitelisting to define ham - the likes of HABEOUS.
>
> Setting it at exactly 0.0 is also problematical since the decision to
> learn is commonly going to be determined by nominally scored rules that
> score 0.001 and -0.001.

Looking at the wiki...

http://wiki.apache.org/spamassassin/BasicConfiguration

We're not using "userconf" whitelisting, our whitelisting is done by 
amavisd-new mappings (where we score specific domains/addresses with a 
small -2 to -5 score).

The wiki, as it is currently, makes it sound like the +0.1 default for 
ham auto-learn is not conservative enough.  And that the +6.0 default 
for auto-learning spam is too risky.

(We run with -0.5 and +9.5 as our boundaries for auto-learning.)

Re: Site-wide Bayes

Posted by RW <rw...@googlemail.com>.

On Wed, 16 Dec 2009 09:36:12 -0500
Michael Scheidell <sc...@secnap.net> wrote:

> On 12/16/09 9:27 AM, Thomas Harold wrote:
> > I'm guessing that you'd also want to change the autolearn
> > thresholds to be stricter?  Like only auto-learning if it scores
> > below -2 or above +10?
> >
> > (That might be an amavisd-new feature.)
> I still use 0, but have the high score at +15.

The default is 0.1 IIRC, and I wouldn't recommend setting it lower
without negative-scoring custom rules - it's set positive for good
reasons. 

BAYES and "userconf" whitelisting rules don't count for autolearning, so
if you set a negative threshold with the default rules, you rely on
DNS whitelisting to define ham - the likes of HABEOUS.

Setting it at exactly 0.0 is also problematical since the decision to
learn is commonly going to be determined by nominally scored rules that
score 0.001 and -0.001.

Re: Site-wide Bayes

Posted by Michael Scheidell <sc...@secnap.net>.

On 12/16/09 9:27 AM, Thomas Harold wrote:
> I'm guessing that you'd also want to change the autolearn thresholds 
> to be stricter?  Like only auto-learning if it scores below -2 or 
> above +10?
>
> (That might be an amavisd-new feature.)
I still use 0, but have the high score at +15.

watch the 'sa-learn dump --magic'

if you can keep the 'spam/ham' ratio close to your sites 'spam vs ham' 
ratio, you should be ok.

-- 
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
 > *| *SECNAP Network Security Corporation

    * Certified SNORT Integrator
    * 2008-9 Hot Company Award Winner, World Executive Alliance
    * Five-Star Partner Program 2009, VARBusiness
    * Best Anti-Spam Product 2008, Network Products Guide
    * King of Spam Filters, SC Magazine 2008

_________________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
_________________________________________________________________________

Re: Site-wide Bayes

Posted by Thomas Harold <th...@nybeta.com>.

On 12/15/2009 11:55 AM, Michael Scheidell wrote:
> On 12/15/09 11:49 AM, Charles Gregory wrote:
>> On Tue, 15 Dec 2009, Matt Garretson wrote:
>>> Heartily agreed. Site-wide bayes here (single database for 2000+
>>> users) catches 40% of the spam here.
>>
>> But what is the FP rate? Is it safe for an ISP with a widely varied
>> user base to use site-wide Bayes?
>>
> I find that you should reduce scores on the high and low end (bayes_00
> and bayes_95) and the 'meta rules' that might combine them also.
>
> (so, yes, an ISP, or for our hosted clients, we have modified the bayes
> scores. . if one client is a plastic surgeon, one is a stock broker, and
> one is a mortgage broker, each will be getting wildly different ham)
>
> setting up a 'per domain' bayes might work, might be tricky, especially
> if an inbound email is going to several domains, and only if you are
> doing B2B (commercial clients)
>

I'm guessing that you'd also want to change the autolearn thresholds to 
be stricter?  Like only auto-learning if it scores below -2 or above +10?

(That might be an amavisd-new feature.)