You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Hoover Chan <hc...@mail.ewind.com> on 2009/03/20 21:18:14 UTC

negative scores for spam

Can someone point me to what I can do to my Spam Assassin config for a situation like the following?

X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
	URIBL_BLACK=1.955, URIBL_GREY=0.25]

That is, a positive score criterion with a spam message that comes out with a negative number.

Thanks in advance.

-----------------------------------------------------------------
Hoover Chan                hchan@mail.ewind.com  -or-  hchan@well.com
Eastwind Associates
P.O. Box 16646             voice: 415-731-6019  -or-  415-565-8936
San Francisco, CA 94116

Re: negative scores for spam

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Hoover Chan <ch...@sacredsf.org>
   Date: Fri, 20 Mar 2009 13:55:08 -0700 (PDT)
   
   The threshold was set to 6.6 (cf. required=6.6). The message this
   was attached to was very definitely junk. This kind of situation got
   me curious about the whole thing where any positive spam score is
   set as the threshold but seeing junk mail coming in with negative
   scores.
   
Train BAYES.  The message hit BAYES_00.  You want BAYES_99.  So either
you have incorrectly learned similar messages or you haven't trained
enough.
-jeff
   
   
   -------------------------------------------------- 
   Hoover Chan                     chan@sacredsf.org 
   Technology Director 
   Schools of the Sacred Heart 
   2222 Broadway St. 
   San Francisco, CA 94115
   
   
   ----- "Rick Macdougall" <ri...@ummm-beer.com> wrote:
   
   > Hoover Chan wrote:
   > > Can someone point me to what I can do to my Spam Assassin config for
   > a situation like the following?
   > > 
   > > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
   > > 	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
   > > 	URIBL_BLACK=1.955, URIBL_GREY=0.25]
   > > 
   > > That is, a positive score criterion with a spam message that comes
   > out with a negative number.
   > > 
   > 
   > Errr
   > 
   > -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
   > 
   > Where do you see that it should be positive ?
   > 
   > Regards,
   > 
   > Rick

Re: negative scores for spam

Posted by LuKreme <kr...@kreme.com>.
On 23-Mar-2009, at 10:14, Chris Barnes wrote:
> But the problem remains.  A simple glance at this list shows that  
> this happens often enough to be a fairly common problem.

Because people don't train bayes properly.

> The question is:  How does one fix the problem after it occurs?

Train bayes with a decent spam/ham corpus?


-- 
In England 100 miles is a long distance. In the US 100 years is a
	long time


Re: negative scores for spam

Posted by Benny Pedersen <me...@junc.org>.
On Thu, March 26, 2009 17:26, Chris Barnes wrote:
> I tried that.  Didn't seem to help.   I think I'll go ahead and just
> rm the files.

rm = Read Manuals ? :=)

-- 
http://localhost/ 100% uptime and 100% mirrored :)


Re: negative scores for spam

Posted by Chris Barnes <ch...@txbarnes.com>.
Matus UHLAR - fantomas wrote:
>> No, I don't still have the messages that were incorrectly trained. 
>> So... it appears that wiping out the bayes database is the way to go. 
>> One final question for this then:  is there a "sa-learn" option I should 
>> use for this, or is doing a simple "rm bayes*" in the .spamassassin 
>> directory preferred?
> 
> sa-learn --clear
> should do that.


I tried that.  Didn't seem to help.   I think I'll go ahead and just rm 
the files.

Re: negative scores for spam

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 25.03.09 11:01, Chris Barnes wrote:
> Thank you for such a good, reasonable answer  (it's good to see SOMEONE 
> is trying to answer questions with non-flippant responses).  :-)
> 
> No, I don't still have the messages that were incorrectly trained. 
> So... it appears that wiping out the bayes database is the way to go. 
> One final question for this then:  is there a "sa-learn" option I should 
> use for this, or is doing a simple "rm bayes*" in the .spamassassin 
> directory preferred?

sa-learn --clear
should do that.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.

Re: negative scores for spam

Posted by Chris Barnes <ch...@txbarnes.com>.
Jeff Mincy wrote:
>>    The question is:  How does one fix the problem after it occurs?
> 
> The way to fix the problem is to relearn any incorrectly learned
> messages.  So any spam message that was incorrectly learned as ham,
> either automatically or manually, needs to be correctly relearned as
> spam using sa-learn.  You should also learn as spam any spam messages
> that hits BAYES_00, or anything less than BAYES_50.  You should also
> do the same thing for HAM messages hitting BAYES_50 - BAYES_99.
> 
> The more messages that you correctly train the more accurate and
> definitive bayes will be.
> 
> If you don't have the incorrectly learned messages to retrain then you
> can always start over by removing the bayes database files in your
> .spamassassin directory.


Thank you for such a good, reasonable answer  (it's good to see SOMEONE 
is trying to answer questions with non-flippant responses).  :-)

No, I don't still have the messages that were incorrectly trained. 
So... it appears that wiping out the bayes database is the way to go. 
One final question for this then:  is there a "sa-learn" option I should 
use for this, or is doing a simple "rm bayes*" in the .spamassassin 
directory preferred?

-- 

Chris Barnes                           AOL IM: CNBarnes
chris-barnes@tamu.edu                Yahoo IM: chrisnbarnes
Computer Systems Manager               MSN IM: chris@txbarnes.com
Department of Physics                      ph: 979-845-7801
Texas A&M University                      fax: 979-845-2590

Re: negative scores for spam

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Chris Barnes <ch...@txbarnes.com>
   Date: Mon, 23 Mar 2009 11:14:37 -0500
   
   Jeff Mincy wrote:
   
   > Yow.  The negative scoring bayes rules are extremely reliable when well
   > trained.  Ham messages are not trying to evade the filter.  Defeating
   > bayes with poison is mostly a myth.  The random garbage might work the
   > first time but not the second time as long as you are training these
   > messages as spam.  If you are getting lots of BAYES_00 hits on spam
   > then the problem is almost certainly incorrect training where spam
   > messages were incorrectly learned as ham.
   
   Fair enough.
   
   But the problem remains.  A simple glance at this list shows that this 
   happens often enough to be a fairly common problem.
   
   The question is:  How does one fix the problem after it occurs?

The way to fix the problem is to relearn any incorrectly learned
messages.  So any spam message that was incorrectly learned as ham,
either automatically or manually, needs to be correctly relearned as
spam using sa-learn.  You should also learn as spam any spam messages
that hits BAYES_00, or anything less than BAYES_50.  You should also
do the same thing for HAM messages hitting BAYES_50 - BAYES_99.

The more messages that you correctly train the more accurate and
definitive bayes will be.

If you don't have the incorrectly learned messages to retrain then you
can always start over by removing the bayes database files in your
.spamassassin directory.

-jeff

Re: negative scores for spam

Posted by Chris Barnes <ch...@txbarnes.com>.
Jeff Mincy wrote:

> Yow.  The negative scoring bayes rules are extremely reliable when well
> trained.  Ham messages are not trying to evade the filter.  Defeating
> bayes with poison is mostly a myth.  The random garbage might work the
> first time but not the second time as long as you are training these
> messages as spam.  If you are getting lots of BAYES_00 hits on spam
> then the problem is almost certainly incorrect training where spam
> messages were incorrectly learned as ham.


Fair enough.


But the problem remains.  A simple glance at this list shows that this 
happens often enough to be a fairly common problem.

The question is:  How does one fix the problem after it occurs?
Is there a FAQ page with step-by-step instructions a person could use?

-- 

Chris Barnes                           AOL IM: CNBarnes
chris-barnes@tamu.edu                Yahoo IM: chrisnbarnes
Computer Systems Manager               MSN IM: chris@txbarnes.com
Department of Physics                      ph: 979-845-7801
Texas A&M University                      fax: 979-845-2590

Re: negative scores for spam

Posted by Jeff Mincy <je...@delphioutpost.com>.
   From: Jesse Stroik <js...@ssec.wisc.edu>
   Date: Fri, 20 Mar 2009 16:14:39 -0500
   
   Hoover Chan wrote:
   > The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.
   
   You are getting negative scores for auto white list and for bayes_00. 
   It's a matter of taste and what you believe makes sense, but I don't 
   consider bayes to be all that accurate (since there are methods for 
   defeating bayes, poisoning bayes, etc).  As such, I don't allow Bayes to 
   assign negative scores or positive scores within a couple of points of 
   the threshold.  You can do so by assigning scores like this:
   
   score BAYES_00  0
   score BAYES_05  0
   score BAYES_20  0
   score BAYES_40  0
   
Yow.  The negative scoring bayes rules are extremely reliable when well
trained.  Ham messages are not trying to evade the filter.  Defeating
bayes with poison is mostly a myth.  The random garbage might work the
first time but not the second time as long as you are training these
messages as spam.  If you are getting lots of BAYES_00 hits on spam
then the problem is almost certainly incorrect training where spam
messages were incorrectly learned as ham.

   I also disable AWL since a lot of spam, especially the stuff most likely 
   to be tested against spamassassin, will like use known good email 
   addresses from your domain as the "from" address.  This is fairly likely 
   to hit on the AWL.

Yow again.   AWL uses email address and the IP address.  So forged
email addresses used in spam is not going to use the same EMAIL+IP
pair as legitimate email using the same email address.
   
   Again, it's just a matter of taste and it all depends on how you've set 
   up your scoring.  I'm pretty cautious to ensure there aren't false 
   positives as that would decrease the value of spamassassin greatly for 
   us, but I otherwise avoid AWL and Bayes negative scores.
   
   If you sent us a copy of the spam, we could test it and show you what 
   should be hitting.

Use pastebin instead.

-jeff

Re: negative scores for spam

Posted by John Hardin <jh...@impsec.org>.
On Fri, 20 Mar 2009, Jesse Stroik wrote:

> Hoover Chan wrote:
>>  The threshold was set to 6.6 (cf. required=6.6). The message this was
>>  attached to was very definitely junk. This kind of situation got me
>>  curious about the whole thing where any positive spam score is set as
>>  the threshold but seeing junk mail coming in with negative scores.
>
> You are getting negative scores for auto white list and for bayes_00.

This means:

(1) mistrained bayes - review your training corpus (you did keep it, 
didn't you?) for correct classification and disable autolearn until you're 
confident reasonable scores are being assigned by SA.

(2) a history of low-scoring spam (AWL wants to reduce the score on this 
one even more, so it thinks this sender has a hammy history). Clear your 
AWL database after you retrain BAYES.

> It's a matter of taste and what you believe makes sense, but I don't 
> consider bayes to be all that accurate (since there are methods for 
> defeating bayes, poisoning bayes, etc).

It's very reliable if you take care training it.

> As such, I don't allow Bayes to assign negative scores or positive 
> scores within a couple of points of the threshold.  You can do so by 
> assigning scores like this:
>
> score BAYES_00  0
> score BAYES_05  0
> score BAYES_20  0
> score BAYES_40  0

Then you're losing a lot of the benefit of Bayes. If you're having that 
serious a problem with it, I'd suggest you need to review your training.

> I also disable AWL since a lot of spam, especially the stuff most likely 
> to be tested against spamassassin, will like use known good email 
> addresses from your domain as the "from" address.  This is fairly likely 
> to hit on the AWL.

AWL includes the source IP address so it's unlikely a forged message will 
benefit from your outbound traffic.

Hoover, you need to review your bayes training corpa and make sure they 
are clean and correct, and retrain from scratch.

Disable autolearn until you're confident SA is scoring well, then start 
autolearn with the thresholds pushed out (e.g. learn as ham when the score 
is below -5, as spam when it's above 15). Also consider whether or not you 
even want to autolearn - if your userbase is small then you may be better 
served by purely manual training fed by a few clueful users.

Also, please do post a sample (all headers intact, to someplace like 
pastebin) of a low-scoring spam. The fact that so few rules hit indicates 
there may be other problems with your config, or that it's a very short 
spam (which is harder to get a good score for). This will also contribute 
to a poorly-performing autotrained bayes.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  1325 days until the Presidential Election

Re: negative scores for spam

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Hoover Chan wrote:
> >The threshold was set to 6.6 (cf. required=6.6). The message this was 
> >attached to was very definitely junk. This kind of situation got me 
> >curious about the whole thing where any positive spam score is set as the 
> >threshold but seeing junk mail coming in with negative scores.

On 20.03.09 16:14, Jesse Stroik wrote:
> You are getting negative scores for auto white list and for bayes_00. 
> It's a matter of taste and what you believe makes sense, but I don't 
> consider bayes to be all that accurate (since there are methods for 
> defeating bayes, poisoning bayes, etc).

What methods? afaik the bayes poisoning turned out to be not working
(and even could help us do detect spam when using hapaxes).

And I don't know anything about defeating BAYES, if properly trained.
Maybe leaving things on autolearn is not a good idea, when not updating
scores (sa-update) and/or not using network checks.

> As such, I don't allow Bayes to 
> assign negative scores or positive scores within a couple of points of 
> the threshold.  You can do so by assigning scores like this:
> 
> score BAYES_00  0
> score BAYES_05  0
> score BAYES_20  0
> score BAYES_40  0

However, it's better not to do that and solve your problem by proper
training the databbase. I found BAYES to be very effective for some wanted
mail sent by lame mailers...

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Remember half the people you know are below average. 

Re: negative scores for spam

Posted by RW <rw...@googlemail.com>.
On Fri, 20 Mar 2009 16:14:39 -0500
Jesse Stroik <js...@ssec.wisc.edu> wrote:

> It's a matter of taste and what you believe makes sense, but I don't 
> consider bayes to be all that accurate (since there are methods for 
> defeating bayes, poisoning bayes, etc).  As such, I don't allow Bayes
> to assign negative scores or positive scores within a couple of
> points of the threshold.  

In my experience, if you train Bayesian filters properly, anti-Bayesian
techniques just create more neutral results rather than errors. Despite
what the spammers do, many people are still getting good results from
pure statistical filters like dspam and bogofilter. 

> I also disable AWL since a lot of spam, especially the stuff most
> likely to be tested against spamassassin, will like use known good
> email addresses from your domain as the "from" address.  This is
> fairly likely to hit on the AWL.

It's supposed to be based on a combination of address and IP address,
so that shouldn't happen

Re: negative scores for spam

Posted by Jesse Stroik <js...@ssec.wisc.edu>.
Hoover Chan wrote:
> The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.


You are getting negative scores for auto white list and for bayes_00. 
It's a matter of taste and what you believe makes sense, but I don't 
consider bayes to be all that accurate (since there are methods for 
defeating bayes, poisoning bayes, etc).  As such, I don't allow Bayes to 
assign negative scores or positive scores within a couple of points of 
the threshold.  You can do so by assigning scores like this:

score BAYES_00  0
score BAYES_05  0
score BAYES_20  0
score BAYES_40  0

I also disable AWL since a lot of spam, especially the stuff most likely 
to be tested against spamassassin, will like use known good email 
addresses from your domain as the "from" address.  This is fairly likely 
to hit on the AWL.

Again, it's just a matter of taste and it all depends on how you've set 
up your scoring.  I'm pretty cautious to ensure there aren't false 
positives as that would decrease the value of spamassassin greatly for 
us, but I otherwise avoid AWL and Bayes negative scores.

If you sent us a copy of the spam, we could test it and show you what 
should be hitting.

Best,
Jesse

Re: negative scores for spam

Posted by Hoover Chan <ch...@sacredsf.org>.
The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.

Thanks.


-------------------------------------------------- 
Hoover Chan                     chan@sacredsf.org 
Technology Director 
Schools of the Sacred Heart 
2222 Broadway St. 
San Francisco, CA 94115


----- "Rick Macdougall" <ri...@ummm-beer.com> wrote:

> Hoover Chan wrote:
> > Can someone point me to what I can do to my Spam Assassin config for
> a situation like the following?
> > 
> > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> > 	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> > 	URIBL_BLACK=1.955, URIBL_GREY=0.25]
> > 
> > That is, a positive score criterion with a spam message that comes
> out with a negative number.
> > 
> 
> Errr
> 
> -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
> 
> Where do you see that it should be positive ?
> 
> Regards,
> 
> Rick

Re: negative scores for spam

Posted by Rick Macdougall <ri...@ummm-beer.com>.
Hoover Chan wrote:
> Can someone point me to what I can do to my Spam Assassin config for a situation like the following?
> 
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> 	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> 	URIBL_BLACK=1.955, URIBL_GREY=0.25]
> 
> That is, a positive score criterion with a spam message that comes out with a negative number.
> 

Errr

-1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600

Where do you see that it should be positive ?

Regards,

Rick


Re: negative scores for spam

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> 	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> 	URIBL_BLACK=1.955, URIBL_GREY=0.25]
                ^^^^^              ^^^^

Other than what's already been mentioned about Bayes and AWL...

Either  (a) there are actually two different, both hitting URIs in that
message, or  (b) your DNS is abusing the service.

Do put a sample up a pastebin or website for review.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: negative scores for spam

Posted by LuKreme <kr...@kreme.com>.
On 20-Mar-2009, at 14:18, Hoover Chan wrote:
> Can someone point me to what I can do to my Spam Assassin config for  
> a situation like the following?
>
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> 	tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> 	URIBL_BLACK=1.955, URIBL_GREY=0.25]

1) don't score 'tagged_above' at -10 and don't change required


-- 
And I just don't care what happens next / looks like freedom but it
	feels like death / it's something in between, I guess