You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by JP Kelly <li...@jpkvideo.net> on 2012/10/22 20:06:01 UTC
BAYES_99 score
Should I set the BAYES_99 score high enough to trigger as spam?
I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.
Re: BAYES_99 score
Posted by John Hardin <jh...@impsec.org>.
On Mon, 22 Oct 2012, darxus@chaosreigns.com wrote:
> On 10/23, Jari Fredriksson wrote:
>> 22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
>>> Huh, ruleqa doesn't track hits to BAYES_99?
>> If it did, against which database it would do that?
>
> It would show the hit rates in the corpora of the masscheck submitters,
> like everything else. So, the databases of the submitters (who are using
> bayes).
...and the central masscheck?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
So Microsoft's invented the ASCII equivalent to ugly ink spots that
appear on your letter when your pen is malfunctioning.
-- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
144 days since the first successful private support mission to ISS (SpaceX)
Re: BAYES_99 score
Posted by da...@chaosreigns.com.
On 10/23, Jari Fredriksson wrote:
> 22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
> > Huh, ruleqa doesn't track hits to BAYES_99?
> If it did, against which database it would do that?
It would show the hit rates in the corpora of the masscheck submitters,
like everything else. So, the databases of the submitters (who are using
bayes).
--
"I don't want people who want to dance, I want people who have to dance."
--George Balanchine
http://www.ChaosReigns.com
Re: BAYES_99 score
Posted by Jari Fredriksson <ja...@iki.fi>.
22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
> Huh, ruleqa doesn't track hits to BAYES_99?
If it did, against which database it would do that?
Just askin...
--
"I'm out of options for now. It is something that has gone wrong "in the apt-get region" (can't find a good expression for that)"
Husse Jun 17 2007
Re: BAYES_99 score
Posted by John Hardin <jh...@impsec.org>.
On Wed, 24 Oct 2012, Cathryn Mataga wrote:
> On 10/24/2012 8:35 AM, Jari Fredriksson wrote:
>> 24.10.2012 18:19, Ned Slider kirjoitti:
>> > I have had very good success running adjusted scores for BAYES rules,
>> > but I am very careful how I train my bayes database. I've disabled
>> > auto-learning and only manually train on hand-checked ham and spam
>> > examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
>> > be highly reliable indicators.
>>
>> I have never seen false BAYES_99, but false BAYES_00 is not that rare.
>
> I'm not sure what's going on, but i cleared Bayes, and set use_auto_learn 0
> and then relearned from HAM/Spam messages, and checking for yesterday, I
> got 12 spam, every single one had BAYES_00 set.
Add those FNs to your spam corpus, and verify by hand every single message
in your ham corpus. Then wipe and retrain again.
If you get hams that score higher than BAYES_00 add them to your ham
training corpus and train.
If you get spams that score less than BAYES_99 add them to your spam
corpus and train.
The training for both of those is considered "daily maintenance" that
should be scripted and run from cron, and doesn't involve a wipe of your
database.
If the FN spams are *extremely* short, they may be misclassified by Bayes.
Were the FNs really short, like a message with just a URI in the body?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You do not examine legislation in the light of the benefits it
will convey if properly administered, but in the light of the
wrongs it would do and the harms it would cause if improperly
administered. -- Lyndon B. Johnson
-----------------------------------------------------------------------
146 days since the first successful private support mission to ISS (SpaceX)
Re: BAYES_99 score
Posted by Cathryn Mataga <ca...@junglevision.com>.
On 10/24/2012 8:35 AM, Jari Fredriksson wrote:
> 24.10.2012 18:19, Ned Slider kirjoitti:
>> I have had very good success running adjusted scores for BAYES rules,
>> but I am very careful how I train my bayes database. I've disabled
>> auto-learning and only manually train on hand-checked ham and spam
>> examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
>> be highly reliable indicators.
> I have never seen false BAYES_99, but false BAYES_00 is not that rare.
>
I'm not sure what's going on, but i cleared Bayes, and set
use_auto_learn 0
and then relearned from HAM/Spam messages, and checking for yesterday, I
got 12 spam, every single one had BAYES_00 set.
I do get a vast amount of spam coming in here, so that 12 is down from
several hundred spam that got marked correctly.
Re: BAYES_99 score
Posted by Jari Fredriksson <ja...@iki.fi>.
24.10.2012 18:19, Ned Slider kirjoitti:
> I have had very good success running adjusted scores for BAYES rules,
> but I am very careful how I train my bayes database. I've disabled
> auto-learning and only manually train on hand-checked ham and spam
> examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
> be highly reliable indicators.
I have never seen false BAYES_99, but false BAYES_00 is not that rare.
--
You learn to write as if to someone else because NEXT YEAR YOU WILL BE
"SOMEONE ELSE."
Re: BAYES_99 score
Posted by Ned Slider <ne...@unixmail.co.uk>.
On 22/10/12 19:15, darxus@chaosreigns.com wrote:
> On 10/22, JP Kelly wrote:
>> Should I set the BAYES_99 score high enough to trigger as spam?
>> I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.
>
> You could. Some people only use bayesian filtering, which would be
> similar. The important question is, how many false positives (non-spams
> flagged as spams) would that cause? SpamAssassin's automated scoring
> attempts to achieve 1 false positive in 2,500 non-spams, with a score
> threshold of 5.0. So if you don't have an absolute minimum of 2,500
> representative non-spams to check for having hit BAYES_99, you risk
> increasing your false positives. But it's your risk to take.
>
I have had very good success running adjusted scores for BAYES rules,
but I am very careful how I train my bayes database. I've disabled
auto-learning and only manually train on hand-checked ham and spam
examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
be highly reliable indicators.
Re: BAYES_99 score
Posted by da...@chaosreigns.com.
On 10/22, JP Kelly wrote:
> Should I set the BAYES_99 score high enough to trigger as spam?
> I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.
You could. Some people only use bayesian filtering, which would be
similar. The important question is, how many false positives (non-spams
flagged as spams) would that cause? SpamAssassin's automated scoring
attempts to achieve 1 false positive in 2,500 non-spams, with a score
threshold of 5.0. So if you don't have an absolute minimum of 2,500
representative non-spams to check for having hit BAYES_99, you risk
increasing your false positives. But it's your risk to take.
Huh, ruleqa doesn't track hits to BAYES_99?
--
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com