You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/05/27 08:30:07 UTC

Re[4]: Is Bayes Really Necessary?

Hello List,

Thursday, May 26, 2005, 11:01:23 PM, you wrote:

LMU> P.S. I know the account says "List Mail User", but why is this the only
LMU> mailing list that almost uniformly references me that way?  Though, I do
LMU> get called by the sobriquet "Administrative User" when I use accounts
LMU> which are labeled like that.  Maybe, it just this list's user base is
LMU> ingrained in using the header label instead of the signature!?  Anyway,
LMU> I kind of like the "LMU" :)

Don't know.  Me, I kind of like responding to the list.  :-)

LMU> 	A quick check of the last couple of days shows 72.96% at BAYES_00
LMU> and 10% at BAYES_99 and 11.29% at BAYES_50.  I suspect the results are less
LMU> extreme for you, but maybe not (that would be good to hear).  Note: I have
LMU> a lot of MTA level rejection, pre-filtering before SA that takes out most
LMU> of the remaining spam and almost all mailing lists are set to use the
LMU> "bayes_ignore_to" directive - so my results posted above are highly skewed
LMU> by all these factors (e.g. > 40% of valid email does not run through bayes,
LMU> and things like nightly server reports generated internally do - I don't
LMU> even trust my own firewall machines' reports).

Interesting stats.

Last month's ham (110,735):
th - 00 - 110173 = 99.5%
th - 01 - 4
th - 05 - 191
th - 20 - 164
th - 30 - 0
th - 40 - 144
th - 44 - 1
th - 50 - 6
th - 60 - 20
th - 80 - 8
th - 95 - 1
th - 99 - 23     = 0.02%

Last month's spam: (79,749):
ts - 00 - 16346  = 20.5%
ts - 01 - 1
ts - 05 - 877    =  1.1%
ts - 20 - 1283   =  1.6%
ts - 30 - 2
ts - 40 - 1607   =  2.0%
ts - 44 - 8
ts - 50 - 415
ts - 60 - 3588   =  4.5%
ts - 80 - 3695   =  4.6%
ts - 95 - 2596   =  3.3%
ts - 99 - 49331  = 61.9%

Obviously Bayes does a whole lot better with ham than it does with
spam here.

Many of the spam that hit BAYES_00 are outscatter. I've identified at
least 3,000 of those during the last month's work on the new obfu
rules. Now that those obfu rules are in place, I suspect those
percentages will shift nicely, but we'll probably continue to get 10%
of spam at Bayes_00.

Yes, you're right -- we do have a lot of other tricks in use here to
get them flagged as spam.   :-)

I hadn't realized that as many as 23 ham had hit BAYES_99. I would
have guessed it was only 5 or 6. We do have a lot of negative scoring
rules which pulled those down as well.  All of them were valid ham
marketing emails from the likes of United Airlines and Staples, which
are now covered by SARE's whitelist.cf.

We did have 15 FPs during this period of time, none of which will
repeat because of whitelist.cf

Bob Menschel