You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by "Brian J. Murrell" <br...@interlinx.bc.ca> on 2008/12/03 05:31:37 UTC

skew the AWL on spam report

If I get a spam and I need to have SA learn that it's spam with
sa-learn, wouldn't it be useful to also skew the AWL for that sender so
that future uses of the AWL for that spammer will push the overall spam
score up?

Thots?

b.

Re: skew the AWL on spam report

Posted by Graham Murray <gr...@gmurray.org.uk>.

"Brian J. Murrell" <br...@interlinx.bc.ca> writes:

> If I get a spam and I need to have SA learn that it's spam with
> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
> that future uses of the AWL for that spammer will push the overall spam
> score up?

And also useful[1] for the opposite case of a false positive where a ham
mail is marked as spam and you manually learn it as ham. 

[1] In fact, I think probably more useful.

SAGrey plugin (was: Re: skew the AWL on spam report)

Posted by Karsten Bräckelmann <gu...@rudersport.de>.

On Wed, 2008-12-03 at 17:38 +0000, Nigel Frankcom wrote:
> Is Mail::SpamAssassin::Plugin::SAGrey part of the stat SA set? Neither
> yum nor CPAN seem to be able to find it here... though that could
> easily be down to user error.

Google finds it quite easily. ;)

  http://wiki.apache.org/spamassassin/CustomPlugins
  http://www.ehall.family-and-friends.us/software/spamassassin/sagrey/

> Hasn't appeared in sa-update either from what I've seen.

Not part of the official SA code, it's a custom plugin. Oh, and plugins
are unlikely to be distributed / updated using sa-update anyway, unless
someone offers a third-party channel dedicated to such things.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: skew the AWL on spam report

Posted by Nigel Frankcom <ni...@blue-canoe.com>.

On Wed, 3 Dec 2008 09:56:58 -0500, Jeff Mincy <je...@delphioutpost.com>
wrote:

>   From: Matt Kettler <mk...@verizon.net>
>   Date: Tue, 02 Dec 2008 23:48:57 -0500
>   
>   Brian J. Murrell wrote:
>   > If I get a spam and I need to have SA learn that it's spam with
>   > sa-learn, wouldn't it be useful to also skew the AWL for that sender so
>   > that future uses of the AWL for that spammer will push the overall spam
>   > score up?
>   > Thots?
>
>You can use spamassassin --add-to-blacklist.   There isn't much of a
>point though, since the email address isn't likely to ever be reused.
>Only 5% of my spam is in the AWL.
>   
>   If a spammer is using the same sending address over and over again,
>   blacklist them entirely.
>   
>Yep.
>
>   That said, I've never seen a spammer re-use the same address twice.
>
>The sagrey plugin addresses this.   Sagrey hits on the 95% of
>spam that is from a new email+IP.
>
>-jeff


Is Mail::SpamAssassin::Plugin::SAGrey part of the stat SA set? Neither
yum nor CPAN seem to be able to find it here... though that could
easily be down to user error. Hasn't appeared in sa-update either from
what I've seen.

Nigel

Re: skew the AWL on spam report

Posted by Jeff Mincy <je...@delphioutpost.com>.

   From: Matt Kettler <mk...@verizon.net>
   Date: Tue, 02 Dec 2008 23:48:57 -0500

   Brian J. Murrell wrote:
   > If I get a spam and I need to have SA learn that it's spam with
   > sa-learn, wouldn't it be useful to also skew the AWL for that sender so
   > that future uses of the AWL for that spammer will push the overall spam
   > score up?
   > Thots?

You can use spamassassin --add-to-blacklist.   There isn't much of a
point though, since the email address isn't likely to ever be reused.
Only 5% of my spam is in the AWL.

   If a spammer is using the same sending address over and over again,
   blacklist them entirely.

Yep.

   That said, I've never seen a spammer re-use the same address twice.

The sagrey plugin addresses this.   Sagrey hits on the 95% of
spam that is from a new email+IP.

-jeff

Re: skew the AWL on spam report

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.

On Thu, 2008-12-04 at 22:38 -0500, Matt Kettler wrote:
>    
> That said, why add code to sa-learn when spamassassin can already do
> something even more complete. Try feeding the message "spamassassin -r
> --add-to-blacklist".

Ahhh.  I was mistakenly thinking that sa-learn == [ update-bayes
database + what spamassassin -r does ].

> Provided you haven't disabled bayes_learn_during_report, the -r will
> cause bayes learning as spam. As a bonus it will also report the message
> to spamcop and razor, pyzor, etc if you have them installed.

Sweet.  Thanx!  Your solution is perfectly reasonable.

b.

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

Brian J. Murrell wrote:
> On Thu, 2008-12-04 at 22:38 -0500, Matt Kettler wrote:
>   
>
> To follow-up on this suggestion...
>
>   
>> That said, why add code to sa-learn when spamassassin can already do
>> something even more complete. Try feeding the message "spamassassin -r
>> --add-to-blacklist".
>>     
>
> It seems (looking at -D output) that spamassassin won't do both of those
> in the same invocation.  If I put both the "-r" and "--add-to-blacklist"
> options on the command-line, it only does the latter.  If I leave off
> the latter command line, it goes ahead and reports the spams to the
> various digest databases.
>   
hrm. I would have expected it to do both if both are explicitly specified.

Oh well, back to yon drawing board...

Re: skew the AWL on spam report

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.

On Thu, 2008-12-04 at 22:38 -0500, Matt Kettler wrote:
> 

To follow-up on this suggestion...

> That said, why add code to sa-learn when spamassassin can already do
> something even more complete. Try feeding the message "spamassassin -r
> --add-to-blacklist".

It seems (looking at -D output) that spamassassin won't do both of those
in the same invocation.  If I put both the "-r" and "--add-to-blacklist"
options on the command-line, it only does the latter.  If I leave off
the latter command line, it goes ahead and reports the spams to the
various digest databases.

b.

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

Brian J. Murrell wrote:
> On Thu, 2008-12-04 at 18:35 -0500, Matt Kettler wrote:
>   
>> ie: you
>> can't tell sa-learn a message is spam and have it apply that information
>> in any way to the AWL.  I guess that's really what my point was, and I
>> expressed it poorly.
>>     
>
> I guess as the OP of this thread, my point was that why shouldn't
> sa-learn skew up the (existing) scores in the AWL when it is given a
> spam to learn?  IOW, if an entry in the AWL doesn't already exist, don't
> add one but if there is a matching entry, skew it's scoring to ensure
> that the next time it's used for this sender, it adds to the spamminess
> score, not subtracts from it.
>
> I have come to understand via this thread that the
> "--add-addr-to-blacklist" (or is it more correctly
> "--add-to-blacklist"?) argument effectively does that, adding a "fake"
> entry to the AWL representing a spam scored at 100 points.
>
> My proposal would be to roll up this "--add-to-blacklist" spamassassin
> argument into sa-learn --ham with the exception of only modifying an
> existing entry, not creating new ones.
>   
Well, part of the point of having sa-learn is to keep it lightweight.
Adding the AWL code doesn't really follow along with that.

That said, why add code to sa-learn when spamassassin can already do
something even more complete. Try feeding the message "spamassassin -r
--add-to-blacklist".

Provided you haven't disabled bayes_learn_during_report, the -r will
cause bayes learning as spam. As a bonus it will also report the message
to spamcop and razor, pyzor, etc if you have them installed.

Re: skew the AWL on spam report

Posted by "Brian J. Murrell" <br...@interlinx.bc.ca>.

On Thu, 2008-12-04 at 18:35 -0500, Matt Kettler wrote:
> 
> ie: you
> can't tell sa-learn a message is spam and have it apply that information
> in any way to the AWL.  I guess that's really what my point was, and I
> expressed it poorly.

I guess as the OP of this thread, my point was that why shouldn't
sa-learn skew up the (existing) scores in the AWL when it is given a
spam to learn?  IOW, if an entry in the AWL doesn't already exist, don't
add one but if there is a matching entry, skew it's scoring to ensure
that the next time it's used for this sender, it adds to the spamminess
score, not subtracts from it.

I have come to understand via this thread that the
"--add-addr-to-blacklist" (or is it more correctly
"--add-to-blacklist"?) argument effectively does that, adding a "fake"
entry to the AWL representing a spam scored at 100 points.

My proposal would be to roll up this "--add-to-blacklist" spamassassin
argument into sa-learn --ham with the exception of only modifying an
existing entry, not creating new ones.

b.

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

mouss wrote:
>
>>> - is it enough to pass few messages? (in short, does "manual" training
>>> have more "weight" than automatic awl learning?)
>>>   
>>>       
>> There's no such thing as manual training of the AWL. Actually, there's
>> no such thing as "training" for it either.
>>
>> The AWL averages scores. nothing more, nothing less. The message score
>> is added when the message is scanned. The AWL has no concept of spam or
>> not, just what the historical average is.
>>
>>     
>
> I understand, but this may be thought of as a form of learning. not
> bayesian, but definitely an automatic learning method that learns the
> (partial) "reputation" of (ip, sender) pairs.
>   

True, but it's also learning that isn't based on categories. Thus the
AWL is not trainable on a spam/nonspam basis as sa-learn does. ie: you
can't tell sa-learn a message is spam and have it apply that information
in any way to the AWL.  I guess that's really what my point was, and I
expressed it poorly.

Re: skew the AWL on spam report

Posted by mouss <mo...@netoyen.net>.

Matt Kettler a écrit :
>> I am thinking about this case: Joe the spammer bombs you with mail that
>> is not detected as spam. he gets a negative awl.
> That statement implies that there's a "score" for the user in the AWL.
> 
> The AWL score varies with what the current messages pre-awl score. The
> AWL can think a sender has a +50 average, ie: strong spam, and if a
> message comes in that scores +100, the AWL will set itself to -25.
> However, if the same message was 0 before the AWL ran, it would give it +25.
> 
> Or were you talking about having a negative average because all the
> messages sent as a bomb had negative scores?
> 

yes.

not really a big deal. I took care of the miscreant, but I was trying to
see if I could be less "aggressive" (and have an automated way to deal
with this, so awl seemed a good place).

>>  so the questions are:
>>
>> - if user passes all the message to sa-learn, will that nuke the
>> negative awl value?
>>   
> sa-learn doesn't touch the AWL. At all.
>> - is it enough to pass few messages? (in short, does "manual" training
>> have more "weight" than automatic awl learning?)
>>   
> There's no such thing as manual training of the AWL. Actually, there's
> no such thing as "training" for it either.
> 
> The AWL averages scores. nothing more, nothing less. The message score
> is added when the message is scanned. The AWL has no concept of spam or
> not, just what the historical average is.
> 

I understand, but this may be thought of as a form of learning. not
bayesian, but definitely an automatic learning method that learns the
(partial) "reputation" of (ip, sender) pairs.

> You can force fake messages with +100 scores in using spamassassin
> --add-addr-to-blacklist, but that's not really "training" it's just
> shoving the average around.
>

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

mouss wrote:
> Matt Kettler a écrit :
>   
>> mouss wrote:
>>     
>>> Matt Kettler a écrit :
>>>   
>>>       
>>>> Brian J. Murrell wrote:
>>>>     
>>>>         
>>>>> If I get a spam and I need to have SA learn that it's spam with
>>>>> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
>>>>> that future uses of the AWL for that spammer will push the overall spam
>>>>> score up?
>>>>>
>>>>> Thots?
>>>>>   
>>>>>       
>>>>>           
>>>> If a spammer is using the same sending address over and over again,
>>>> blacklist them entirely.
>>>>
>>>> That said, I've never seen a spammer re-use the same address twice.
>>>>     
>>>>         
>>> My understanding is "the other side". you get a spam and awl gives it a
>>> negative score. you run sa-learn and you want this to "nuke" the awl
>>> entry because if awl gives a too negative score, then sa-learn is
>>> useless (unless BAYES_99 is set to a very high value).
>>>
>>>   
>>>       
>> That sounds like you have a broken trust path. It seems unlikely you'd
>> have gotten nonspam from the same address *AND* IP address before.
>>
>>     
>
> I am thinking about this case: Joe the spammer bombs you with mail that
> is not detected as spam. he gets a negative awl.
That statement implies that there's a "score" for the user in the AWL.

The AWL score varies with what the current messages pre-awl score. The
AWL can think a sender has a +50 average, ie: strong spam, and if a
message comes in that scores +100, the AWL will set itself to -25.
However, if the same message was 0 before the AWL ran, it would give it +25.

Or were you talking about having a negative average because all the
messages sent as a bomb had negative scores?

>  so the questions are:
>
> - if user passes all the message to sa-learn, will that nuke the
> negative awl value?
>   
sa-learn doesn't touch the AWL. At all.
> - is it enough to pass few messages? (in short, does "manual" training
> have more "weight" than automatic awl learning?)
>   
There's no such thing as manual training of the AWL. Actually, there's
no such thing as "training" for it either.

The AWL averages scores. nothing more, nothing less. The message score
is added when the message is scanned. The AWL has no concept of spam or
not, just what the historical average is.

You can force fake messages with +100 scores in using spamassassin
--add-addr-to-blacklist, but that's not really "training" it's just
shoving the average around.

>
>

Re: skew the AWL on spam report

Posted by mouss <mo...@netoyen.net>.

Matt Kettler a écrit :
> mouss wrote:
>> Matt Kettler a écrit :
>>   
>>> Brian J. Murrell wrote:
>>>     
>>>> If I get a spam and I need to have SA learn that it's spam with
>>>> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
>>>> that future uses of the AWL for that spammer will push the overall spam
>>>> score up?
>>>>
>>>> Thots?
>>>>   
>>>>       
>>> If a spammer is using the same sending address over and over again,
>>> blacklist them entirely.
>>>
>>> That said, I've never seen a spammer re-use the same address twice.
>>>     
>> My understanding is "the other side". you get a spam and awl gives it a
>> negative score. you run sa-learn and you want this to "nuke" the awl
>> entry because if awl gives a too negative score, then sa-learn is
>> useless (unless BAYES_99 is set to a very high value).
>>
>>   
> That sounds like you have a broken trust path. It seems unlikely you'd
> have gotten nonspam from the same address *AND* IP address before.
> 

I am thinking about this case: Joe the spammer bombs you with mail that
is not detected as spam. he gets a negative awl. so the questions are:

- if user passes all the message to sa-learn, will that nuke the
negative awl value?

- is it enough to pass few messages? (in short, does "manual" training
have more "weight" than automatic awl learning?)

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

mouss wrote:
> Matt Kettler a écrit :
>   
>> Brian J. Murrell wrote:
>>     
>>> If I get a spam and I need to have SA learn that it's spam with
>>> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
>>> that future uses of the AWL for that spammer will push the overall spam
>>> score up?
>>>
>>> Thots?
>>>   
>>>       
>> If a spammer is using the same sending address over and over again,
>> blacklist them entirely.
>>
>> That said, I've never seen a spammer re-use the same address twice.
>>     
>
> My understanding is "the other side". you get a spam and awl gives it a
> negative score. you run sa-learn and you want this to "nuke" the awl
> entry because if awl gives a too negative score, then sa-learn is
> useless (unless BAYES_99 is set to a very high value).
>
>   
That sounds like you have a broken trust path. It seems unlikely you'd
have gotten nonspam from the same address *AND* IP address before.

Re: skew the AWL on spam report

Posted by mouss <mo...@netoyen.net>.

Matt Kettler a écrit :
> Brian J. Murrell wrote:
>> If I get a spam and I need to have SA learn that it's spam with
>> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
>> that future uses of the AWL for that spammer will push the overall spam
>> score up?
>>
>> Thots?
>>   
> 
> If a spammer is using the same sending address over and over again,
> blacklist them entirely.
> 
> That said, I've never seen a spammer re-use the same address twice.

My understanding is "the other side". you get a spam and awl gives it a
negative score. you run sa-learn and you want this to "nuke" the awl
entry because if awl gives a too negative score, then sa-learn is
useless (unless BAYES_99 is set to a very high value).

Re: skew the AWL on spam report

Posted by Benny Pedersen <me...@junc.org>.

On Wed, December 3, 2008 05:48, Matt Kettler wrote:

> That said, I've never seen a spammer re-use the same address twice.

i have :-)

olso why spf / dkim whitelist is the way to go, let spammers try to
get whitelisted

microsoft got it wroung with "Block Sender" :)


-- 
Benny Pedersen
Need more webspace ? http://www.servage.net/?coupon=cust37098

Re: skew the AWL on spam report

Posted by James Wilkinson <sa...@aprilcottage.co.uk>.

Matt Kettler wrote:
> If a spammer is using the same sending address over and over again,
> blacklist them entirely.
> 
> That said, I've never seen a spammer re-use the same address twice.

Doesn’t mean it doesn’t happen – only that you’re not on any
“narrowcast” lists (e.g. “Email 200,000 British business addresses!”)

A number of otherwise-legitimate companies will send to those lists.
SpamAssassin’s fixed rules will do very little against these spammers –
they’re using the same e-mail engines, the same language, and the same
sort of Internet connections as other companies with opt-in mailing
lists. For some addresses, they represent the vast majority of spam that
gets past SpamAssassin.

Fortunately, they often do use and re-use the same IP addresses and the
same domains, and can be blocked that way.

http://groups.google.com/group/news.admin.net-abuse.email/browse_thread/thread/78ac67e56ce35d90
lists a good (bad?) selection of them…

James.

-- 
E-mail:     james@ | Remember, half-measures can be very effective if all you
aprilcottage.co.uk | deal with are half-wits.

Re: skew the AWL on spam report

Posted by Matt Kettler <mk...@verizon.net>.

Brian J. Murrell wrote:
> If I get a spam and I need to have SA learn that it's spam with
> sa-learn, wouldn't it be useful to also skew the AWL for that sender so
> that future uses of the AWL for that spammer will push the overall spam
> score up?
>
> Thots?
>   

If a spammer is using the same sending address over and over again,
blacklist them entirely.

That said, I've never seen a spammer re-use the same address twice.