You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Matt Kettler <mk...@comcast.net> on 2006/04/29 21:02:33 UTC

Re: Those "Re: good obfupills" spams (bayes scores)

Bart Schaefer wrote:
> On 4/29/06, Matt Kettler <mk...@comcast.net> wrote:
>> Besides.. If you want to make a mathematics based argument against me,
>> start by explaining how the perceptron mathematically is flawed. It
>> assigned the original score based on real-world data.
>
> Did it?  I thought the BAYES_* scores have been fixed values for a
> while now, to force the perceptron to adapt the other scores to fit.
>
Actually, you're right..I'm shocked and floored, but you're right.

 In SA 3.1.0 they did force-fix the scores of the bayes rules,
particularly the high-end. The perceptron assigned BAYES_99 a score of
1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.

That does make me wonder if:
    1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
due to the ham corpus being polluted with spam. This forces the
perceptron to attempt to compensate.  (Pollution always is a problem
since nobody is perfect, but it occurs to differing degrees).
   -or-
    2) The perceptron is out-of whack. (I highly doubt this because the
perceptron generated the ones for 3.0.x and they were fine)
  -or-
    3) The Real-world FPs of BAYES_99 really do tend to also be cascades
with other rules in the 3.1.x ruleset, and the perceptron is correctly
capping the score. This could differ from 3.0.x due to change in rules,
or change in ham patterns over time.
  -or-
    4) one of the corpus submitters has a poorly trained bayes db.
(possible, but I doubt it)

Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight
increase in ham-hits for BAYES_99 and a slight decrease in spam hits.
3.0.x:
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
43.515     89.3888     0.0335     1.000     0.83     1.89     BAYES_99
3.1.x:
OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
60.712     86.7351     0.0396     1.000     0.90     3.50     BAYES_99

Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of
spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)


Re: Those "Re: good obfupills" spams (bayes scores)

Posted by jdow <jd...@earthlink.net>.
From: "Bart Schaefer" <ba...@gmail.com>

On 4/29/06, Matt Kettler <mk...@comcast.net> wrote:
>  In SA 3.1.0 they did force-fix the scores of the bayes rules,
> particularly the high-end. The perceptron assigned BAYES_99 a score of
> 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.
>
> That does make me wonder if:
>     1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
> due to the ham corpus being polluted with spam.

My recollection is that there was speculation that the BAYES_9x rules
were scored "too low" not because they FP'd in conjunction with other
rules, but because against the corpus they TRUE P'd in conjunction
with lots of other rules, and that it therefore wasn't necessary for
the perceptron to assign a high score to BAYES_9x in order to push the
total over the 5.0 threshold.

The trouble with that is that users expect training on their personal
spam flow to have a more significant effect on the scoring.  I want to
train bayes to compensate for the LACK of other rules matching, not
just to give a final nudge when a bunch of others already hit.

I filed a bugzilla some while ago suggesting that the bayes percentage
ought to be used to select a rule set, not to adjust the score as a
component of a rule set.

<< jdow >> There is one other gotcha. I bet vastly different scores
are warranted for Bayes when run with per user training and rules
as compared to global training and rules.

{^_^}

Re: Those "Re: good obfupills" spams (bayes scores)

Posted by Bart Schaefer <ba...@gmail.com>.
On 4/29/06, Matt Kettler <mk...@comcast.net> wrote:
>  In SA 3.1.0 they did force-fix the scores of the bayes rules,
> particularly the high-end. The perceptron assigned BAYES_99 a score of
> 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.
>
> That does make me wonder if:
>     1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
> due to the ham corpus being polluted with spam.

My recollection is that there was speculation that the BAYES_9x rules
were scored "too low" not because they FP'd in conjunction with other
rules, but because against the corpus they TRUE P'd in conjunction
with lots of other rules, and that it therefore wasn't necessary for
the perceptron to assign a high score to BAYES_9x in order to push the
total over the 5.0 threshold.

The trouble with that is that users expect training on their personal
spam flow to have a more significant effect on the scoring.  I want to
train bayes to compensate for the LACK of other rules matching, not
just to give a final nudge when a bunch of others already hit.

I filed a bugzilla some while ago suggesting that the bayes percentage
ought to be used to select a rule set, not to adjust the score as a
component of a rule set.

Re: Those "Re: good obfupills" spams (bayes scores)

Posted by jdow <jd...@earthlink.net>.
From: "Matt Kettler" <mk...@comcast.net>

> Bart Schaefer wrote:
>> On 4/29/06, Matt Kettler <mk...@comcast.net> wrote:
>>> Besides.. If you want to make a mathematics based argument against me,
>>> start by explaining how the perceptron mathematically is flawed. It
>>> assigned the original score based on real-world data.
>>
>> Did it?  I thought the BAYES_* scores have been fixed values for a
>> while now, to force the perceptron to adapt the other scores to fit.
>>
> Actually, you're right..I'm shocked and floored, but you're right.
> 
> In SA 3.1.0 they did force-fix the scores of the bayes rules,
> particularly the high-end. The perceptron assigned BAYES_99 a score of
> 1.89 in the 3.1.0 mass-check run. The devs jacked it up to 3.50.
> 
> That does make me wonder if:
>    1) When BAYES_9x FPs, it FPs in conjunction with lots of other rules
> due to the ham corpus being polluted with spam. This forces the
> perceptron to attempt to compensate.  (Pollution always is a problem
> since nobody is perfect, but it occurs to differing degrees).
>   -or-
>    2) The perceptron is out-of whack. (I highly doubt this because the
> perceptron generated the ones for 3.0.x and they were fine)
>  -or-
>    3) The Real-world FPs of BAYES_99 really do tend to also be cascades
> with other rules in the 3.1.x ruleset, and the perceptron is correctly
> capping the score. This could differ from 3.0.x due to change in rules,
> or change in ham patterns over time.
>  -or-
>    4) one of the corpus submitters has a poorly trained bayes db.
> (possible, but I doubt it)
> 
> Looking at statistics-set3 for 3.0.x and 3.1.x there was a slight
> increase in ham-hits for BAYES_99 and a slight decrease in spam hits.
> 3.0.x:
> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
> 43.515     89.3888     0.0335     1.000     0.83     1.89     BAYES_99
> 3.1.x:
> OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
> 60.712     86.7351     0.0396     1.000     0.90     3.50     BAYES_99
> 
> Also to consider is set3 of 3.0.x was much closer to a 50/50 mix of
> spam/nonspam (48.7/51.3) than 3.1.0 was (nearly 70/30)

What happens comes from the basic reality that Bayes and the other
rules are not orthogonal sets. So many other rules hit 95 and 99 that
the perceptron artificially reduced the goodness rating for these rules.

It needs some serious skewing to catch situations where 95 or 99 hit and
very few other rules hit. Those are the times the accuracy of Bayes is
needed the most. I've found, here, that 5.0 is a suitable score. I
suspect if I were more realistic 4.9 would be closer. But I still do
remember learning the score bias and being floored by it when I noticed
99 on some spams that leaked through with ONLY the 99 hit. I am speaking
of dozens of spams hit that way.

So far over several years I've found a few special cases that warrant
negative rules. That seems to be pulling the 99 rule's false alarm
rate down to "I can't see it." (I have, however, been tempted to generate
a BAYES_99p5 rule and a BAYES_99p9 rule to fine tune the scores up around
4.9 and 5.0.)

{^_