You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jake Colman <co...@ppllc.com> on 2005/05/26 16:08:36 UTC

Is Bayes Really Necessary?

Given the rather complete set of rules that ship with SA and which can
expanded with SARE, does bayes learning really help?  Won't the rules catch
pretty much everything anyway?

-- 
Jake Colman
Sr. Applications Developer
Principia Partners LLC
Harborside Financial Center
1001 Plaza Two
Jersey City, NJ 07311
(201) 209-2467
www.principiapartners.com


Re: Is Bayes Really Necessary?

Posted by jdow <jd...@earthlink.net>.
From: "David B Funk" <db...@engineering.uiowa.edu>

> As spammers are constantly mutating and adapting, having a dynamic,
> adaptive component of SA is a must to avoid the "saw-tooth" effect.
> (a fresh SA install works great, gradually loses effectiveness until a
> new update install, and so on).

Um, yeah, you make an fresh install with no SARE rules and its REALLY
bad. It saw tooths upwards as you break down and install more SARE rules.
Then a periodic update keeps you up there quite nicely.

Seriously, I was AMAZED at how bad a raw 3.02 install was here until I
put in the SARE rules, even after I got the Bayes trained. (Did that
right away off my saved ham and spam database.)

{^_-}


Re: Is Bayes Really Necessary?

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Thu, 26 May 2005, Thomas Cameron wrote:

> On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote:
> > Given the rather complete set of rules that ship with SA and which can
> > expanded with SARE, does bayes learning really help?  Won't the rules catch
> > pretty much everything anyway?
>
> I have used SA with Bayes and it took quite a bit of administrative
> overhead.  It worked amazingly well, though.
>
> I now run SA with DCC, Razor, Pyzor and network checks and without Bayes
> and it still Just Works(TM).  Seriously - I have customers who slather

You could make the argument that Razor, Pyzor, etc perform a similar
function to Bayes (analyze a message, generate some kind of 'collapsed'
representation, compare it with a database of known messages
and come up with a "spammyness" value).

As spammers are constantly mutating and adapting, having a dynamic,
adaptive component of SA is a must to avoid the "saw-tooth" effect.
(a fresh SA install works great, gradually loses effectiveness until a
new update install, and so on).

Bayes has the advantage that it's local, no network overhead, can be
trained to 'know' your specific kinds of messages.

Bayes has the disadvantage that it's your local responsibility to
see that it's trained properly.


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Is Bayes Really Necessary?

Posted by Thomas Cameron <th...@camerontech.com>.
On Thu, 2005-05-26 at 10:08 -0400, Jake Colman wrote:
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules catch
> pretty much everything anyway?

I have used SA with Bayes and it took quite a bit of administrative
overhead.  It worked amazingly well, though.  

I now run SA with DCC, Razor, Pyzor and network checks and without Bayes
and it still Just Works(TM).  Seriously - I have customers who slather
their e-mail addresses all over Usenet, message boards, on their web
pages, etc.  They might as well put a big sign up that says SPAM ME
PLEASE!!!  

But they don't get any spam - SA and spamass-milter rejects all of it.
It is really amazing - I've got clients who went from hundreds of spams
per day down to one or two that slip through per week.  Of course, when
one gets through, my phone rings!

I guess my experience is that either way, SA Just Works(TM).

Cheers,
Thomas


Re: Is Bayes Really Necessary?

Posted by Dimitri Yioulos <dy...@firstbhph.com>.
On Thursday May 26 2005 1:13 pm, Loren Wilton wrote:
> > Given the rather complete set of rules that ship with SA and which can
> > expanded with SARE, does bayes learning really help?  Won't the rules
>
> catch
>
> > pretty much everything anyway?
>
> Um, maybe, maybe not.
>
> Bayes *necessary*?  No, especially if you run net tests.
> Bayes *highly desirable*?  Yup.  An additional 4 points can really help
> when a new spam shows up that you don't have a lot of rules for.
>
>         Loren

Loren's point well taken.  I think it's the use of bayes in conjunction with 
other rules that tends to work best. At least, that's my experience.

Dimitri

Re: Is Bayes Really Necessary?

Posted by Loren Wilton <lw...@earthlink.net>.
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules
catch
> pretty much everything anyway?

Um, maybe, maybe not.

Bayes *necessary*?  No, especially if you run net tests.
Bayes *highly desirable*?  Yup.  An additional 4 points can really help when
a new spam shows up that you don't have a lot of rules for.

        Loren


Re: Is Bayes Really Necessary?

Posted by "Eric A. Hall" <eh...@ehsco.com>.
On 5/26/2005 10:08 AM, Jake Colman wrote:
> Given the rather complete set of rules that ship with SA and which can
> expanded with SARE, does bayes learning really help?  Won't the rules catch
> pretty much everything anyway?

The base SA install is insufficient, but if you tweak the scores and add
some additional tests, you can get by without bayes just fine. I use a
select set of RBLs, Razor, rulesets from rulesemporium, and my own
LDAP-based weighting plugin, and my highest spam only gets an average of
one spam per day, and even those are over the 5.0 threshold (so they are
auto-filed into the Junk Email folder).

Bayes is great for per-user stuff, but unless you are willing to manage
the per-user databases (which I'm not), it is easier to just tweak the
system scores and rules. Less management overhead, less CPU, etc.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/