You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/07/26 13:35:23 UTC

Re: BAYES_99 and ham

martin f krafft writes:
> Hi list,
> 
> I just had a flood of spam coming through, which SA classified as
> ham. On closer inspection, it turns out that the only tests
> triggered for all those mails were HTML_MESSAGE and BAYES_99.
> 
> HTML messages are commonplace today (unfortunately), so they don't
> add anything to the score.
> 
> BAYES_99 yields 3.5 points.
> 
> What's curious is that in this scenario, even though SA thinks that
> the message is 99%-100% likely to be spam, it will always classify
> it as ham, and further learning does not have any noticeable effect.
> 
> I know how SA scores are computed. I do wonder how that algorithm
> applies to the BAYES_* tests though. Don't you think BAYES_99 should
> yield > 5 points to trigger the threshold on default installs?
> Shouldn't thus BAYES_* be renormalised?

The Bayes rules are too dependent on user training to be entirely
trustworthy, and most users will not train them enough, or occasionally
make mistakes, for them to be treated as such.  However, if you've put in
the effort to train them well, feel free to increase their score...

--j.

Re: BAYES_99 and ham

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

> On Thu, 26 Jul 2007, Matus UHLAR - fantomas wrote:
> > The downside is, that even if user train much, they'll get _much_
> > of spam hitting only 3.5 points with BAYES_99.

On 26.07.07 09:30, John D. Hardin wrote:
> There's some reluctance to put Poison Pill rules into the default 
> distribution... :)
> 
> If you trust your Bayes training, then increase the score of BAYES_99. 
> Mine is at 4.5 (which is better but still doesn't make it a poison 
> pill).

... and since 4.5 doesn't seem to be poison pill even for me, I wonder why
there's reluctance to bump it at least to 4.0 :)

Could bayes also count number of spams/hams in database to narrow its
decisions?

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]

Re: BAYES_99 and ham

Posted by "John D. Hardin" <jh...@impsec.org>.

On Thu, 26 Jul 2007, Matus UHLAR - fantomas wrote:

> The downside is, that even if user train much, they'll get _much_
> of spam hitting only 3.5 points with BAYES_99.

There's some reluctance to put Poison Pill rules into the default 
distribution... :)

If you trust your Bayes training, then increase the score of BAYES_99. 
Mine is at 4.5 (which is better but still doesn't make it a poison 
pill).

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Where We Want You To Go Today 07/05/07: Microsoft patents in-OS
  adware architecture incorporating spyware, profiling, competitor
  suppression and delivery confirmation (U.S. Patent #20070157227)
-----------------------------------------------------------------------
 9 days until The 272nd anniversary of John Peter Zenger's acquittal

Re: BAYES_99 and ham

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

> >martin f krafft writes:
> >>I just had a flood of spam coming through, which SA classified as
> >>ham. On closer inspection, it turns out that the only tests
> >>triggered for all those mails were HTML_MESSAGE and BAYES_99.
[...]
> >>I know how SA scores are computed. I do wonder how that algorithm
> >>applies to the BAYES_* tests though. Don't you think BAYES_99 should
> >>yield > 5 points to trigger the threshold on default installs?
> >>Shouldn't thus BAYES_* be renormalised?

I was also thinking about this problem...

> Justin Mason schrieb:
> >The Bayes rules are too dependent on user training to be entirely
> >trustworthy, and most users will not train them enough, or occasionally
> >make mistakes, for them to be treated as such.  However, if you've put in
> >the effort to train them well, feel free to increase their score...

I have perfectly configured BAYES filter but I'm a bit afraid of turning
scores.

Couldn't BAYES filter narrow its decisions based on number of spams/hams in
database?

Like, if there's less than 500 spams and 500 hams in DB, the BAYES would not
return more than BAYES_80 and less than BAYES_20. Under 1000/1000 it would
not return more than BAYES_95 and less than BAYES_05. Or maybe the count of
spams/hams could be handled separately...

On 26.07.07 13:40, Matthias Haegele wrote:
> Yes, most users wont train, but constantly complain about the bad 
> performance of spam scoring ;-).

The downside is, that even if user train much, they'll get _much_ of spam
hitting only 3.5 points with BAYES_99. They don't think it's good to train
since it doesn't help. Not every user who can train (by scripts provided by
admin) can also modify the score. I'm also afraid that users able to modify
score would set some scores too high or too low and then complaining even
more.

> Never seen False Scoring for BAYES_99 (well trained, manual).
> Spam rarely gets > BAYES_50.

This has to be "ham", right?

> So the higher score works fine (for me).

as I said above, I'm afraid about modifying scores for users, even for me.
I hope that default scores in SA will have spread a bit more, so users will
have a bit more power over them, when training SA carefully, not just when
modifying defaults...

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
BSE = Mad Cow Desease ... BSA = Mad Software Producents Desease

Re: BAYES_99 and ham

Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.

Justin Mason schrieb:
> martin f krafft writes:
>> Hi list,
>>
>> I just had a flood of spam coming through, which SA classified as
>> ham. On closer inspection, it turns out that the only tests
>> triggered for all those mails were HTML_MESSAGE and BAYES_99.
>>
>> HTML messages are commonplace today (unfortunately), so they don't
>> add anything to the score.
>>
>> BAYES_99 yields 3.5 points.
>>
>> What's curious is that in this scenario, even though SA thinks that
>> the message is 99%-100% likely to be spam, it will always classify
>> it as ham, and further learning does not have any noticeable effect.
>>
>> I know how SA scores are computed. I do wonder how that algorithm
>> applies to the BAYES_* tests though. Don't you think BAYES_99 should
>> yield > 5 points to trigger the threshold on default installs?
>> Shouldn't thus BAYES_* be renormalised?
> 
> The Bayes rules are too dependent on user training to be entirely
> trustworthy, and most users will not train them enough, or occasionally
> make mistakes, for them to be treated as such.  However, if you've put in
> the effort to train them well, feel free to increase their score...

Yes, most users wont train, but constantly complain about the bad 
performance of spam scoring ;-).

Never seen False Scoring for BAYES_99 (well trained, manual).
Spam rarely gets > BAYES_50.
So the higher score works fine (for me).

Just my 2 cent.

> --j.


-- 
Grüsse/Greetings
MH


Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--