You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Marc Perkel <ma...@perkel.com> on 2004/04/06 01:37:50 UTC

Rule Classifications - I like it!

I like the new 20_drugs.cf file and I'm wondering if we should create 
ather classifications of rules like this. One for porn - one for 
finance/mortgage/creditcards - etc.

Also - have the ability to declare a default score for everything in 
that file. So that before it's scored - you can say give it a 3 instead 
of the default 1.

I've also thought that rule classifications could be scored in a way 
that they had independent totals - the dug score - the sex score - the 
credit card scam score - etc - with the idea of maybe being able to 
apply a scaling factor to the classification. A church might want to 
scale up the porn score. A reality company might want to scale down the 
financials scores.

Anyhow - in the interest of fine grained controls and managability - I 
make this suggestion.


Re: Rule Classifications - I like it!

Posted by Pete McNeil <ma...@microneil.com>.
At 07:37 PM 4/5/2004, Mark wrote:
>I like the new 20_drugs.cf file and I'm wondering if we should create 
>ather classifications of rules like this. One for porn - one for 
>finance/mortgage/creditcards - etc.
>
>Also - have the ability to declare a default score for everything in that 
>file. So that before it's scored - you can say give it a 3 instead of the 
>default 1.
>
>I've also thought that rule classifications could be scored in a way that 
>they had independent totals - the dug score - the sex score - the credit 
>card scam score - etc - with the idea of maybe being able to apply a 
>scaling factor to the classification. A church might want to scale up the 
>porn score. A reality company might want to scale down the financials scores.

We do precisely this with our Message Sniffer product with mixed results. 
It turns out that rules that score highly for drugs (snakeoil) frequently 
match credit card (debt) and even porn (adult) classifications. In practice 
there is little distinction except perhaps for porn/adult. Spammers tend to 
reuse domains and other header & obfuscation patterns across these three 
categories in particular.

It turns out that most of the time if a customer ranks one of the groups 
higher it is not because they have a particular filtering classification in 
mind, but rather because a particular classification tends to have higher 
accuracy in general... Due to the way we source our rules the porn/adult 
group tends to be slightly more accurate than some general rules - but 
about the same as drugs. Debt can sometimes be less accurate but not often. 
Frequently the slight distinction is amplified in the mind of the end user 
more than the statistics really support...

I suspect that similar classifications implemented directly in SA would 
have similar statistics.

$0.02
_M

Ref Classifications:
http://www.sortmonster.com/MessageSniffer/Help/ResultCodesHelp.html

Ref SA Plugin:
http://www.sortmonster.com/MessageSniffer/Installation/SpamAssassin.html