You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by JP Kelly <li...@jpkvideo.net> on 2012/10/22 20:06:01 UTC

BAYES_99 score

Should I set the BAYES_99 score high enough to trigger as spam?
I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.

Re: BAYES_99 score

Posted by John Hardin <jh...@impsec.org>.
On Mon, 22 Oct 2012, darxus@chaosreigns.com wrote:

> On 10/23, Jari Fredriksson wrote:
>> 22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
>>> Huh, ruleqa doesn't track hits to BAYES_99?
>> If it did, against which database it would do that?
>
> It would show the hit rates in the corpora of the masscheck submitters,
> like everything else.  So, the databases of the submitters (who are using
> bayes).

...and the central masscheck?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   So Microsoft's invented the ASCII equivalent to ugly ink spots that
   appear on your letter when your pen is malfunctioning.
          -- Greg Andrews, about Microsoft's way to encode apostrophes
-----------------------------------------------------------------------
  144 days since the first successful private support mission to ISS (SpaceX)

Re: BAYES_99 score

Posted by da...@chaosreigns.com.
On 10/23, Jari Fredriksson wrote:
> 22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
> > Huh, ruleqa doesn't track hits to BAYES_99?
> If it did, against which database it would do that?

It would show the hit rates in the corpora of the masscheck submitters,
like everything else.  So, the databases of the submitters (who are using
bayes).

-- 
"I don't want people who want to dance, I want people who have to dance."
--George Balanchine
http://www.ChaosReigns.com

Re: BAYES_99 score

Posted by Jari Fredriksson <ja...@iki.fi>.
22.10.2012 21:15, darxus@chaosreigns.com kirjoitti:
> Huh, ruleqa doesn't track hits to BAYES_99?
If it did, against which database it would do that?

Just askin...

-- 

"I'm out of options for now. It is something that has gone wrong "in the apt-get region" (can't find a good expression for that)"

Husse Jun 17 2007



Re: BAYES_99 score

Posted by John Hardin <jh...@impsec.org>.
On Wed, 24 Oct 2012, Cathryn Mataga wrote:

> On 10/24/2012 8:35 AM, Jari Fredriksson wrote:
>>  24.10.2012 18:19, Ned Slider kirjoitti:
>> >  I have had very good success running adjusted scores for BAYES rules,
>> >  but I am very careful how I train my bayes database. I've disabled
>> >  auto-learning and only manually train on hand-checked ham and spam
>> >  examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
>> >  be highly reliable indicators.
>>
>>  I have never seen false BAYES_99, but false BAYES_00 is not that rare.
>
> I'm not sure what's going on, but i cleared Bayes, and set use_auto_learn 0
> and then relearned from HAM/Spam messages, and checking for yesterday, I
> got 12 spam, every single one had BAYES_00 set.

Add those FNs to your spam corpus, and verify by hand every single message 
in your ham corpus. Then wipe and retrain again.

If you get hams that score higher than BAYES_00 add them to your ham 
training corpus and train.

If you get spams that score less than BAYES_99 add them to your spam 
corpus and train.

The training for both of those is considered "daily maintenance" that 
should be scripted and run from cron, and doesn't involve a wipe of your 
database.

If the FN spams are *extremely* short, they may be misclassified by Bayes. 
Were the FNs really short, like a message with just a URI in the body?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  146 days since the first successful private support mission to ISS (SpaceX)

Re: BAYES_99 score

Posted by Cathryn Mataga <ca...@junglevision.com>.
On 10/24/2012 8:35 AM, Jari Fredriksson wrote:
> 24.10.2012 18:19, Ned Slider kirjoitti:
>> I have had very good success running adjusted scores for BAYES rules,
>> but I am very careful how I train my bayes database. I've disabled
>> auto-learning and only manually train on hand-checked ham and spam
>> examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
>> be highly reliable indicators.
> I have never seen false BAYES_99, but false BAYES_00 is not that rare.
>


  I'm not sure what's going on, but i cleared Bayes, and set 
use_auto_learn 0
and then relearned from HAM/Spam messages, and checking for yesterday, I
got 12 spam, every single one had BAYES_00 set.


I do get a vast amount of spam coming in here, so that 12 is  down from
several hundred spam that got marked correctly.

Re: BAYES_99 score

Posted by Jari Fredriksson <ja...@iki.fi>.
24.10.2012 18:19, Ned Slider kirjoitti:
> I have had very good success running adjusted scores for BAYES rules,
> but I am very careful how I train my bayes database. I've disabled
> auto-learning and only manually train on hand-checked ham and spam
> examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to
> be highly reliable indicators. 
I have never seen false BAYES_99, but false BAYES_00 is not that rare.

-- 

You learn to write as if to someone else because NEXT YEAR YOU WILL BE
"SOMEONE ELSE."



Re: BAYES_99 score

Posted by Ned Slider <ne...@unixmail.co.uk>.
On 22/10/12 19:15, darxus@chaosreigns.com wrote:
> On 10/22, JP Kelly wrote:
>> Should I set the BAYES_99 score high enough to trigger as spam?
>> I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.
>
> You could.  Some people only use bayesian filtering, which would be
> similar.  The important question is, how many false positives (non-spams
> flagged as spams) would that cause?  SpamAssassin's automated scoring
> attempts to achieve 1 false positive in 2,500 non-spams, with a score
> threshold of 5.0.  So if you don't have an absolute minimum of 2,500
> representative non-spams to check for having hit BAYES_99, you risk
> increasing your false positives.  But it's your risk to take.
>

I have had very good success running adjusted scores for BAYES rules, 
but I am very careful how I train my bayes database. I've disabled 
auto-learning and only manually train on hand-checked ham and spam 
examples. Consequently, I find the extremes (BAYES_99 and BAYES_00) to 
be highly reliable indicators.


Re: BAYES_99 score

Posted by da...@chaosreigns.com.
On 10/22, JP Kelly wrote:
> Should I set the BAYES_99 score high enough to trigger as spam?
> I get plenty of spam getting through which does not get caught because BAYES_99 is the only rule which fires and it is not set to score at or above the threshold.

You could.  Some people only use bayesian filtering, which would be
similar.  The important question is, how many false positives (non-spams
flagged as spams) would that cause?  SpamAssassin's automated scoring
attempts to achieve 1 false positive in 2,500 non-spams, with a score
threshold of 5.0.  So if you don't have an absolute minimum of 2,500
representative non-spams to check for having hit BAYES_99, you risk
increasing your false positives.  But it's your risk to take.

Huh, ruleqa doesn't track hits to BAYES_99?

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com