You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bill McCormick <wp...@sbcglobal.net> on 2007/03/30 23:01:35 UTC

Missing tests/scores?

I think my SA is missing some tests. 

I would hope the attached message would score something on obfuscate; the part about anotomy; the man I'd like to be and the reference to my girlfriend/wife. 

Could somebody run this to see what you get?

X-Spam-Report: 
	*  0.1 FORGED_RCVD_HELO Received: contains a forged HELO
	*  0.0 UNPARSEABLE_RELAY Informational: message has unparseable relay lines
	*  9.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
	*      [score: 1.0000]
	*  0.2 DNS_FROM_RFC_ABUSE RBL: Envelope sender in abuse.rfc-ignorant.org
	*  1.7 DNS_FROM_RFC_POST RBL: Envelope sender in
	*      postmaster.rfc-ignorant.org
	*  3.0 GEO_CITIES1 Geocities common


unsubscribe

Posted by Jörg Hanke <jo...@hanke.co.at>.

-----Ursprüngliche Nachricht-----
Von: Bill McCormick [mailto:wpmccormick@sbcglobal.net] 
Gesendet: Samstag, 31. März 2007 05:54
An: users@spamassassin.apache.org
Betreff: Re: Missing tests/scores?

Matt Kettler wrote:
> Bill McCormick wrote:
>> OK, so they're parts of normal conversations. And geocites is a real
>> domain too, which scores fairly high since it's so prevalent in real
>> world spam. If this mail would have scored just 1 more point, we would
>> not even be having this conversation because SA would have deleted it.
>> Having a individual rules for this sort of thing that, when taken
>> together with other scores seems to be exactly what SA is all about. 
> 
> Fair enough. Personally I don't have the end goal of trying to get all
> my spam high scoring enough it gets deleted. I'm quite happy to
> autodelete about half of it, and have the other half tagged and shuffled
> into a junk box for my casual skimming. From this perspective, my
> primary reaction was "hell, the score of the thing's over 13 points,
> what more do you need?"
> 
I worked the same way for some time. Then I found SARE. Now it's like an 
obsession; must delete spam; must delete ALL spam :)

>> Maybe some of my custom scores could be further raised and/or my
>> sa_delete setting lowered, but they seem pretty reasonable and in line
>> with what others are doing.
>> How many rules do I need? As many as it takes. What's the rule count
>> up to anyway?
>>
>> Anyway, I think I'll try my hand at writing some rules for this.
> 
> 
> Just be aware that in general adding on more and more rules shifts up
> the average score of both spam and nonspam. Ideally, your spam rules
> should be as spam specific as possible, so this effect is more
> noticeable in the spam, and less noticeable in the nonspam. At the most
> extreme end, adding a lot of rules with very poor selectivity is on
> average the same as reducing your thresholds.
> 
> Writing rules is easy, writing good rules is sometimes less obvious than
> your think.
> 

How true. I'm familiar with this axiom in my own line of work.

> I strongly suggest a quick read of the version of the rules writing
> guide over in the wiki, precluded by the admission that I am biased here
> as I wrote the original text, but many have helped me along the way and
> others have built much upon it since.
> 
> http://wiki.apache.org/spamassassin/WritingRules
> 
> Even if you know the mechanics of regexes and SA syntax, the "writing
> better rules" at the end is quite handy.
> 
Thanks!!

Re: Missing tests/scores?

Posted by Bill McCormick <wp...@sbcglobal.net>.
Matt Kettler wrote:
> Bill McCormick wrote:
>> OK, so they're parts of normal conversations. And geocites is a real
>> domain too, which scores fairly high since it's so prevalent in real
>> world spam. If this mail would have scored just 1 more point, we would
>> not even be having this conversation because SA would have deleted it.
>> Having a individual rules for this sort of thing that, when taken
>> together with other scores seems to be exactly what SA is all about. 
> 
> Fair enough. Personally I don't have the end goal of trying to get all
> my spam high scoring enough it gets deleted. I'm quite happy to
> autodelete about half of it, and have the other half tagged and shuffled
> into a junk box for my casual skimming. From this perspective, my
> primary reaction was "hell, the score of the thing's over 13 points,
> what more do you need?"
> 
I worked the same way for some time. Then I found SARE. Now it's like an 
obsession; must delete spam; must delete ALL spam :)

>> Maybe some of my custom scores could be further raised and/or my
>> sa_delete setting lowered, but they seem pretty reasonable and in line
>> with what others are doing.
>> How many rules do I need? As many as it takes. What's the rule count
>> up to anyway?
>>
>> Anyway, I think I'll try my hand at writing some rules for this.
> 
> 
> Just be aware that in general adding on more and more rules shifts up
> the average score of both spam and nonspam. Ideally, your spam rules
> should be as spam specific as possible, so this effect is more
> noticeable in the spam, and less noticeable in the nonspam. At the most
> extreme end, adding a lot of rules with very poor selectivity is on
> average the same as reducing your thresholds.
> 
> Writing rules is easy, writing good rules is sometimes less obvious than
> your think.
> 

How true. I'm familiar with this axiom in my own line of work.

> I strongly suggest a quick read of the version of the rules writing
> guide over in the wiki, precluded by the admission that I am biased here
> as I wrote the original text, but many have helped me along the way and
> others have built much upon it since.
> 
> http://wiki.apache.org/spamassassin/WritingRules
> 
> Even if you know the mechanics of regexes and SA syntax, the "writing
> better rules" at the end is quite handy.
> 
Thanks!!

Re: Missing tests/scores?

Posted by Matt Kettler <mk...@verizon.net>.
Bill McCormick wrote:
>
> OK, so they're parts of normal conversations. And geocites is a real
> domain too, which scores fairly high since it's so prevalent in real
> world spam. If this mail would have scored just 1 more point, we would
> not even be having this conversation because SA would have deleted it.
> Having a individual rules for this sort of thing that, when taken
> together with other scores seems to be exactly what SA is all about. 

Fair enough. Personally I don't have the end goal of trying to get all
my spam high scoring enough it gets deleted. I'm quite happy to
autodelete about half of it, and have the other half tagged and shuffled
into a junk box for my casual skimming. From this perspective, my
primary reaction was "hell, the score of the thing's over 13 points,
what more do you need?"

> Maybe some of my custom scores could be further raised and/or my
> sa_delete setting lowered, but they seem pretty reasonable and in line
> with what others are doing.
> How many rules do I need? As many as it takes. What's the rule count
> up to anyway?
>
> Anyway, I think I'll try my hand at writing some rules for this.


Just be aware that in general adding on more and more rules shifts up
the average score of both spam and nonspam. Ideally, your spam rules
should be as spam specific as possible, so this effect is more
noticeable in the spam, and less noticeable in the nonspam. At the most
extreme end, adding a lot of rules with very poor selectivity is on
average the same as reducing your thresholds.

Writing rules is easy, writing good rules is sometimes less obvious than
your think.

I strongly suggest a quick read of the version of the rules writing
guide over in the wiki, precluded by the admission that I am biased here
as I wrote the original text, but many have helped me along the way and
others have built much upon it since.

http://wiki.apache.org/spamassassin/WritingRules

Even if you know the mechanics of regexes and SA syntax, the "writing
better rules" at the end is quite handy.


Re: Missing tests/scores?

Posted by Bill McCormick <wp...@sbcglobal.net>.
Matt Kettler wrote:
> Bill McCormick wrote:
>> I think my SA is missing some tests.
> It's not.. besides, your bayes tore this one up. How many rules do you need?
>> I would hope the attached message would score something on obfuscate;
>> the part about anotomy; the man I'd like to be and the reference to my
>> girlfriend/wife.
>> Could somebody run this to see what you get?
> I don't think there are rules for any of that.
> 
[convincing examples snipped]
> 
> All of these are parts of normal conversational phrases present in nonspam.
> 

OK, so they're parts of normal conversations. And geocites is a real 
domain too, which scores fairly high since it's so prevalent in real 
world spam. If this mail would have scored just 1 more point, we would 
not even be having this conversation because SA would have deleted it. 
Having a individual rules for this sort of thing that, when taken 
together with other scores seems to be exactly what SA is all about. 
Maybe some of my custom scores could be further raised and/or my 
sa_delete setting lowered, but they seem pretty reasonable and in line 
with what others are doing.

How many rules do I need? As many as it takes. What's the rule count up 
to anyway?

Anyway, I think I'll try my hand at writing some rules for this.

Thanks

Re: Missing tests/scores?

Posted by Matt Kettler <mk...@verizon.net>.
Bill McCormick wrote:
> I think my SA is missing some tests.
It's not.. besides, your bayes tore this one up. How many rules do you need?
> I would hope the attached message would score something on obfuscate;
> the part about anotomy; the man I'd like to be and the reference to my
> girlfriend/wife.
> Could somebody run this to see what you get?
I don't think there are rules for any of that.

The "manly parts" could be talking about the manly parts of a movie,
engine maintenance or sports.

"It was mostly a romantic film, but it had some manly parts"

"He began to carve the meat, and served the food (or at least the manly
parts of it. I think her mother might have been allowed to help with the
less manly food, like the beans)"

".... refuses to let me be trained in maintenance or in the manly parts
of go-kart operation."

The bit about being the man you'd like to be.. That could be an ad for a
Halloween costume, or a even an humorous jab at a friend.

(example ad for a reno 911 police uniform costume, slightly off color,
but not enhancement spam.)

http://ccinsider.comedycentral.com/cc_insider/reno_911/index.html

The girlfriend/wife thing, well that could be an ad for flowers. Or even
a car..

First hit on google for "Your girlfriend will love it" comes back with:

http://www.gminsidenews.com/forums/archive/index.php?t-19794.html

Another comes back with a Harry potter game review claming:

"Guys, your girlfriend will love it. Dads, your kids will love it.
Wives, your husband will love it. "

All of these are parts of normal conversational phrases present in nonspam.