You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Hoover Chan <hc...@mail.ewind.com> on 2009/03/20 21:18:14 UTC
negative scores for spam
Can someone point me to what I can do to my Spam Assassin config for a situation like the following?
X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
URIBL_BLACK=1.955, URIBL_GREY=0.25]
That is, a positive score criterion with a spam message that comes out with a negative number.
Thanks in advance.
-----------------------------------------------------------------
Hoover Chan hchan@mail.ewind.com -or- hchan@well.com
Eastwind Associates
P.O. Box 16646 voice: 415-731-6019 -or- 415-565-8936
San Francisco, CA 94116
Re: negative scores for spam
Posted by Jeff Mincy <je...@delphioutpost.com>.
From: Hoover Chan <ch...@sacredsf.org>
Date: Fri, 20 Mar 2009 13:55:08 -0700 (PDT)
The threshold was set to 6.6 (cf. required=6.6). The message this
was attached to was very definitely junk. This kind of situation got
me curious about the whole thing where any positive spam score is
set as the threshold but seeing junk mail coming in with negative
scores.
Train BAYES. The message hit BAYES_00. You want BAYES_99. So either
you have incorrectly learned similar messages or you haven't trained
enough.
-jeff
--------------------------------------------------
Hoover Chan chan@sacredsf.org
Technology Director
Schools of the Sacred Heart
2222 Broadway St.
San Francisco, CA 94115
----- "Rick Macdougall" <ri...@ummm-beer.com> wrote:
> Hoover Chan wrote:
> > Can someone point me to what I can do to my Spam Assassin config for
> a situation like the following?
> >
> > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> > tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> > URIBL_BLACK=1.955, URIBL_GREY=0.25]
> >
> > That is, a positive score criterion with a spam message that comes
> out with a negative number.
> >
>
> Errr
>
> -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
>
> Where do you see that it should be positive ?
>
> Regards,
>
> Rick
Re: negative scores for spam
Posted by LuKreme <kr...@kreme.com>.
On 23-Mar-2009, at 10:14, Chris Barnes wrote:
> But the problem remains. A simple glance at this list shows that
> this happens often enough to be a fairly common problem.
Because people don't train bayes properly.
> The question is: How does one fix the problem after it occurs?
Train bayes with a decent spam/ham corpus?
--
In England 100 miles is a long distance. In the US 100 years is a
long time
Re: negative scores for spam
Posted by Benny Pedersen <me...@junc.org>.
On Thu, March 26, 2009 17:26, Chris Barnes wrote:
> I tried that. Didn't seem to help. I think I'll go ahead and just
> rm the files.
rm = Read Manuals ? :=)
--
http://localhost/ 100% uptime and 100% mirrored :)
Re: negative scores for spam
Posted by Chris Barnes <ch...@txbarnes.com>.
Matus UHLAR - fantomas wrote:
>> No, I don't still have the messages that were incorrectly trained.
>> So... it appears that wiping out the bayes database is the way to go.
>> One final question for this then: is there a "sa-learn" option I should
>> use for this, or is doing a simple "rm bayes*" in the .spamassassin
>> directory preferred?
>
> sa-learn --clear
> should do that.
I tried that. Didn't seem to help. I think I'll go ahead and just rm
the files.
Re: negative scores for spam
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 25.03.09 11:01, Chris Barnes wrote:
> Thank you for such a good, reasonable answer (it's good to see SOMEONE
> is trying to answer questions with non-flippant responses). :-)
>
> No, I don't still have the messages that were incorrectly trained.
> So... it appears that wiping out the bayes database is the way to go.
> One final question for this then: is there a "sa-learn" option I should
> use for this, or is doing a simple "rm bayes*" in the .spamassassin
> directory preferred?
sa-learn --clear
should do that.
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.
Re: negative scores for spam
Posted by Chris Barnes <ch...@txbarnes.com>.
Jeff Mincy wrote:
>> The question is: How does one fix the problem after it occurs?
>
> The way to fix the problem is to relearn any incorrectly learned
> messages. So any spam message that was incorrectly learned as ham,
> either automatically or manually, needs to be correctly relearned as
> spam using sa-learn. You should also learn as spam any spam messages
> that hits BAYES_00, or anything less than BAYES_50. You should also
> do the same thing for HAM messages hitting BAYES_50 - BAYES_99.
>
> The more messages that you correctly train the more accurate and
> definitive bayes will be.
>
> If you don't have the incorrectly learned messages to retrain then you
> can always start over by removing the bayes database files in your
> .spamassassin directory.
Thank you for such a good, reasonable answer (it's good to see SOMEONE
is trying to answer questions with non-flippant responses). :-)
No, I don't still have the messages that were incorrectly trained.
So... it appears that wiping out the bayes database is the way to go.
One final question for this then: is there a "sa-learn" option I should
use for this, or is doing a simple "rm bayes*" in the .spamassassin
directory preferred?
--
Chris Barnes AOL IM: CNBarnes
chris-barnes@tamu.edu Yahoo IM: chrisnbarnes
Computer Systems Manager MSN IM: chris@txbarnes.com
Department of Physics ph: 979-845-7801
Texas A&M University fax: 979-845-2590
Re: negative scores for spam
Posted by Jeff Mincy <je...@delphioutpost.com>.
From: Chris Barnes <ch...@txbarnes.com>
Date: Mon, 23 Mar 2009 11:14:37 -0500
Jeff Mincy wrote:
> Yow. The negative scoring bayes rules are extremely reliable when well
> trained. Ham messages are not trying to evade the filter. Defeating
> bayes with poison is mostly a myth. The random garbage might work the
> first time but not the second time as long as you are training these
> messages as spam. If you are getting lots of BAYES_00 hits on spam
> then the problem is almost certainly incorrect training where spam
> messages were incorrectly learned as ham.
Fair enough.
But the problem remains. A simple glance at this list shows that this
happens often enough to be a fairly common problem.
The question is: How does one fix the problem after it occurs?
The way to fix the problem is to relearn any incorrectly learned
messages. So any spam message that was incorrectly learned as ham,
either automatically or manually, needs to be correctly relearned as
spam using sa-learn. You should also learn as spam any spam messages
that hits BAYES_00, or anything less than BAYES_50. You should also
do the same thing for HAM messages hitting BAYES_50 - BAYES_99.
The more messages that you correctly train the more accurate and
definitive bayes will be.
If you don't have the incorrectly learned messages to retrain then you
can always start over by removing the bayes database files in your
.spamassassin directory.
-jeff
Re: negative scores for spam
Posted by Chris Barnes <ch...@txbarnes.com>.
Jeff Mincy wrote:
> Yow. The negative scoring bayes rules are extremely reliable when well
> trained. Ham messages are not trying to evade the filter. Defeating
> bayes with poison is mostly a myth. The random garbage might work the
> first time but not the second time as long as you are training these
> messages as spam. If you are getting lots of BAYES_00 hits on spam
> then the problem is almost certainly incorrect training where spam
> messages were incorrectly learned as ham.
Fair enough.
But the problem remains. A simple glance at this list shows that this
happens often enough to be a fairly common problem.
The question is: How does one fix the problem after it occurs?
Is there a FAQ page with step-by-step instructions a person could use?
--
Chris Barnes AOL IM: CNBarnes
chris-barnes@tamu.edu Yahoo IM: chrisnbarnes
Computer Systems Manager MSN IM: chris@txbarnes.com
Department of Physics ph: 979-845-7801
Texas A&M University fax: 979-845-2590
Re: negative scores for spam
Posted by Jeff Mincy <je...@delphioutpost.com>.
From: Jesse Stroik <js...@ssec.wisc.edu>
Date: Fri, 20 Mar 2009 16:14:39 -0500
Hoover Chan wrote:
> The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.
You are getting negative scores for auto white list and for bayes_00.
It's a matter of taste and what you believe makes sense, but I don't
consider bayes to be all that accurate (since there are methods for
defeating bayes, poisoning bayes, etc). As such, I don't allow Bayes to
assign negative scores or positive scores within a couple of points of
the threshold. You can do so by assigning scores like this:
score BAYES_00 0
score BAYES_05 0
score BAYES_20 0
score BAYES_40 0
Yow. The negative scoring bayes rules are extremely reliable when well
trained. Ham messages are not trying to evade the filter. Defeating
bayes with poison is mostly a myth. The random garbage might work the
first time but not the second time as long as you are training these
messages as spam. If you are getting lots of BAYES_00 hits on spam
then the problem is almost certainly incorrect training where spam
messages were incorrectly learned as ham.
I also disable AWL since a lot of spam, especially the stuff most likely
to be tested against spamassassin, will like use known good email
addresses from your domain as the "from" address. This is fairly likely
to hit on the AWL.
Yow again. AWL uses email address and the IP address. So forged
email addresses used in spam is not going to use the same EMAIL+IP
pair as legitimate email using the same email address.
Again, it's just a matter of taste and it all depends on how you've set
up your scoring. I'm pretty cautious to ensure there aren't false
positives as that would decrease the value of spamassassin greatly for
us, but I otherwise avoid AWL and Bayes negative scores.
If you sent us a copy of the spam, we could test it and show you what
should be hitting.
Use pastebin instead.
-jeff
Re: negative scores for spam
Posted by John Hardin <jh...@impsec.org>.
On Fri, 20 Mar 2009, Jesse Stroik wrote:
> Hoover Chan wrote:
>> The threshold was set to 6.6 (cf. required=6.6). The message this was
>> attached to was very definitely junk. This kind of situation got me
>> curious about the whole thing where any positive spam score is set as
>> the threshold but seeing junk mail coming in with negative scores.
>
> You are getting negative scores for auto white list and for bayes_00.
This means:
(1) mistrained bayes - review your training corpus (you did keep it,
didn't you?) for correct classification and disable autolearn until you're
confident reasonable scores are being assigned by SA.
(2) a history of low-scoring spam (AWL wants to reduce the score on this
one even more, so it thinks this sender has a hammy history). Clear your
AWL database after you retrain BAYES.
> It's a matter of taste and what you believe makes sense, but I don't
> consider bayes to be all that accurate (since there are methods for
> defeating bayes, poisoning bayes, etc).
It's very reliable if you take care training it.
> As such, I don't allow Bayes to assign negative scores or positive
> scores within a couple of points of the threshold. You can do so by
> assigning scores like this:
>
> score BAYES_00 0
> score BAYES_05 0
> score BAYES_20 0
> score BAYES_40 0
Then you're losing a lot of the benefit of Bayes. If you're having that
serious a problem with it, I'd suggest you need to review your training.
> I also disable AWL since a lot of spam, especially the stuff most likely
> to be tested against spamassassin, will like use known good email
> addresses from your domain as the "from" address. This is fairly likely
> to hit on the AWL.
AWL includes the source IP address so it's unlikely a forged message will
benefit from your outbound traffic.
Hoover, you need to review your bayes training corpa and make sure they
are clean and correct, and retrain from scratch.
Disable autolearn until you're confident SA is scoring well, then start
autolearn with the thresholds pushed out (e.g. learn as ham when the score
is below -5, as spam when it's above 15). Also consider whether or not you
even want to autolearn - if your userbase is small then you may be better
served by purely manual training fed by a few clueful users.
Also, please do post a sample (all headers intact, to someplace like
pastebin) of a low-scoring spam. The fact that so few rules hit indicates
there may be other problems with your config, or that it's a very short
spam (which is harder to get a good score for). This will also contribute
to a poorly-performing autotrained bayes.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
You do not examine legislation in the light of the benefits it
will convey if properly administered, but in the light of the
wrongs it would do and the harms it would cause if improperly
administered. -- Lyndon B. Johnson
-----------------------------------------------------------------------
1325 days until the Presidential Election
Re: negative scores for spam
Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Hoover Chan wrote:
> >The threshold was set to 6.6 (cf. required=6.6). The message this was
> >attached to was very definitely junk. This kind of situation got me
> >curious about the whole thing where any positive spam score is set as the
> >threshold but seeing junk mail coming in with negative scores.
On 20.03.09 16:14, Jesse Stroik wrote:
> You are getting negative scores for auto white list and for bayes_00.
> It's a matter of taste and what you believe makes sense, but I don't
> consider bayes to be all that accurate (since there are methods for
> defeating bayes, poisoning bayes, etc).
What methods? afaik the bayes poisoning turned out to be not working
(and even could help us do detect spam when using hapaxes).
And I don't know anything about defeating BAYES, if properly trained.
Maybe leaving things on autolearn is not a good idea, when not updating
scores (sa-update) and/or not using network checks.
> As such, I don't allow Bayes to
> assign negative scores or positive scores within a couple of points of
> the threshold. You can do so by assigning scores like this:
>
> score BAYES_00 0
> score BAYES_05 0
> score BAYES_20 0
> score BAYES_40 0
However, it's better not to do that and solve your problem by proper
training the databbase. I found BAYES to be very effective for some wanted
mail sent by lame mailers...
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Remember half the people you know are below average.
Re: negative scores for spam
Posted by RW <rw...@googlemail.com>.
On Fri, 20 Mar 2009 16:14:39 -0500
Jesse Stroik <js...@ssec.wisc.edu> wrote:
> It's a matter of taste and what you believe makes sense, but I don't
> consider bayes to be all that accurate (since there are methods for
> defeating bayes, poisoning bayes, etc). As such, I don't allow Bayes
> to assign negative scores or positive scores within a couple of
> points of the threshold.
In my experience, if you train Bayesian filters properly, anti-Bayesian
techniques just create more neutral results rather than errors. Despite
what the spammers do, many people are still getting good results from
pure statistical filters like dspam and bogofilter.
> I also disable AWL since a lot of spam, especially the stuff most
> likely to be tested against spamassassin, will like use known good
> email addresses from your domain as the "from" address. This is
> fairly likely to hit on the AWL.
It's supposed to be based on a combination of address and IP address,
so that shouldn't happen
Re: negative scores for spam
Posted by Jesse Stroik <js...@ssec.wisc.edu>.
Hoover Chan wrote:
> The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.
You are getting negative scores for auto white list and for bayes_00.
It's a matter of taste and what you believe makes sense, but I don't
consider bayes to be all that accurate (since there are methods for
defeating bayes, poisoning bayes, etc). As such, I don't allow Bayes to
assign negative scores or positive scores within a couple of points of
the threshold. You can do so by assigning scores like this:
score BAYES_00 0
score BAYES_05 0
score BAYES_20 0
score BAYES_40 0
I also disable AWL since a lot of spam, especially the stuff most likely
to be tested against spamassassin, will like use known good email
addresses from your domain as the "from" address. This is fairly likely
to hit on the AWL.
Again, it's just a matter of taste and it all depends on how you've set
up your scoring. I'm pretty cautious to ensure there aren't false
positives as that would decrease the value of spamassassin greatly for
us, but I otherwise avoid AWL and Bayes negative scores.
If you sent us a copy of the spam, we could test it and show you what
should be hitting.
Best,
Jesse
Re: negative scores for spam
Posted by Hoover Chan <ch...@sacredsf.org>.
The threshold was set to 6.6 (cf. required=6.6). The message this was attached to was very definitely junk. This kind of situation got me curious about the whole thing where any positive spam score is set as the threshold but seeing junk mail coming in with negative scores.
Thanks.
--------------------------------------------------
Hoover Chan chan@sacredsf.org
Technology Director
Schools of the Sacred Heart
2222 Broadway St.
San Francisco, CA 94115
----- "Rick Macdougall" <ri...@ummm-beer.com> wrote:
> Hoover Chan wrote:
> > Can someone point me to what I can do to my Spam Assassin config for
> a situation like the following?
> >
> > X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> > tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> > URIBL_BLACK=1.955, URIBL_GREY=0.25]
> >
> > That is, a positive score criterion with a spam message that comes
> out with a negative number.
> >
>
> Errr
>
> -1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
>
> Where do you see that it should be positive ?
>
> Regards,
>
> Rick
Re: negative scores for spam
Posted by Rick Macdougall <ri...@ummm-beer.com>.
Hoover Chan wrote:
> Can someone point me to what I can do to my Spam Assassin config for a situation like the following?
>
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> URIBL_BLACK=1.955, URIBL_GREY=0.25]
>
> That is, a positive score criterion with a spam message that comes out with a negative number.
>
Errr
-1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600
Where do you see that it should be positive ?
Regards,
Rick
Re: negative scores for spam
Posted by Karsten Bräckelmann <gu...@rudersport.de>.
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> URIBL_BLACK=1.955, URIBL_GREY=0.25]
^^^^^ ^^^^
Other than what's already been mentioned about Bayes and AWL...
Either (a) there are actually two different, both hitting URIs in that
message, or (b) your DNS is abusing the service.
Do put a sample up a pastebin or website for review.
--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: negative scores for spam
Posted by LuKreme <kr...@kreme.com>.
On 20-Mar-2009, at 14:18, Hoover Chan wrote:
> Can someone point me to what I can do to my Spam Assassin config for
> a situation like the following?
>
> X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
> tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
> URIBL_BLACK=1.955, URIBL_GREY=0.25]
1) don't score 'tagged_above' at -10 and don't change required
--
And I just don't care what happens next / looks like freedom but it
feels like death / it's something in between, I guess