You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bob Menschel <Ro...@Menschel.net> on 2004/07/08 03:49:16 UTC

Re[2]: 70_sare_ratware.cf hits legitimate email clients

Hello Dimitrios,

Tuesday, July 6, 2004, 6:28:22 PM, you wrote:

D> No, i haven't seen a single false possitive due to these two rules.
D> Though i still think they are too broad or generic rules.

It's a tough, philosophical question, and one concerning which I find
myself on both sides of the fence.

If something in an email is reliable spamsign, ie: its occurrence is
strongly suggestive of spam, then it's a candidate for an SA rule. The
questions become:
a) how much spam does it need to hit in order to be worth the computer
resources?
b) What ratio of spam/ham does it need to hit in order to be worth the
computer resources?

The SpamAssassin development team has fairly formal guidelines by
which they judge most rules: to qualify for official distribution
within SpamAsssasin, a rule needs to match against at least 0.1% of
all spam (1% or higher is better), and it needs to have an S/O ratio
of 0.900 (9 spam to 1 ham). Once it meets these criteria (subject to
change, so don't take these numbers as law), then the score is
determined through the developmental mass-check procedure to maximize
the amount of spam flagged and minimize the false positives.

SARE's guidelines are less formal, and also less stringent. Our goal
is to increase our abilities to catch spam, through casting a broader
net (more rules).

You can see some of our considerations by comparing the multiple files
within the rule sets I maintain.  70_sare_html0.cf contains rules
which hit no ham. It also contains those rules which in our combined
corpus tests hit at least 10 spam. That's no where near 0.01%, but it
will allow us to develop rules that flag spam which distribution rules
will not.

70_sare_html1.cf contains rules which do hit ham, but which maintain
an S/O of 0.9 or higher. It also includes rules which hit fewer than
10 spam but still no ham. I consider this to be a good rule set for
most systems to use, but one to be avoided by those with resource
constraints.

70_sare_html2.cf contains rules which in our recent mass-check tests
do not hit any emails. They include obfuscation tests and such which
should be strong indicators of spam if they ever do hit. This rule set
is therefore available for use by those systems that want to be
aggressive, proactive, and have the resources to spare.

70_sare_html3.cf contains rules which many people will consider "too
broad or generic." They hit ham. They can hit lots of ham. They are
prevalent enough in spam to be worth including, at leats some of us
think so, and they are scored low enough so they should not be a
danger.

The goal with 70_sare_html3.cf is that if without these rules a spam
is pushed to 4.93 or thereabouts, we'd like to push that spam over the
threshold. Our expectation is that ham won't get that close to the
threshold to cause a false positive. We're conservative enough with
those scores that we're almost always right.

The Ratware rules you question are older, and haven't been explored as
rigorously, but they are scored conservatively because of this
philosophy. They've proven useful from time to time, and we haven't
chosen to remove the rule yet. (We may or may not in the future --
these things are always open for (re)evaluation.)

If you don't like the rules, they're simple enough to turn off --
simply create a line in your local.cf
> score RATWR10_MESSID 0
That will supercede the score in the Ratware rule set, and turn the
rule off.

If you can put together a good argument in favor for dropping the rule
(something more substantial than "I think they're too broad"), we'll
gladly consider your argument. SARE is always open for improvement.

Bob Menschel



Re[2]: 70_sare_ratware.cf hits legitimate email clients

Posted by Bob Menschel <Ro...@Menschel.net>.
Hello Dimitrios,

Wednesday, July 7, 2004, 7:32:44 PM, you wrote:

D> On Wed, 7 Jul 2004 18:49:16 -0700 "Bob Menschel" <Ro...@menschel.net> wrote:

>> If you can put together a good argument in favor for dropping the rule
>> (something more substantial than "I think they're too broad"), we'll
>> gladly consider your argument. SARE is always open for improvement.

D> first of all, thanks for taking the time to answer in detail.

D> i dont have a specific argument against those two rules, other than
D> what i've already said, i just see them hit too many legitimate
D> emails. thus, i've disabled them so no harm done.

D> on a side note, can you please take a look at my custom rules
D> which i've published in the SARE forums? if you have some spare
D> time, i'd like you to apply the same test on them, of how many
D> spam/ham they hit.

Will do, but the critical resource is that "spare time." I'm 19 hours
into a likely 30-hour work day right now. I won't be doing any rules
development nor mass-checks for another week or two...

Bob Menschel



Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Jesse Houwing <j....@rulesemporium.com>.
Dimitrios wrote:

>On Thu, 08 Jul 2004 11:42:13 +0200 "Jesse Houwing" <j....@rulesemporium.com> wrote:
>
>  
>
>>     39        0       39    0.000   0.00   1.00  GR_DOMAIN_ARGOSOFT
>>      1        0        1    0.000   0.00   1.50  GR_DOMAIN_M5MAILER4
>>    
>>
>
>
>interesting results.
>
>obviously you don't have any Greek-specific spam, so these rules pretty much
>didn't hit anything specific.
>
>its also interesting that you've got ham that hit the ArgoSoft and Mach5 rules,
>in my experience ArgoSoft is a well known util used on exploitable systems
>for repote spamming (open relay smtp), Mach5 Mailer is also a spammer tool
>which you can buy from several websites.
>  
>
The one M5 mailer could be a spam messag ethat slipped through, I'll 
check that. I don't think I've got 39 misclassified mails for the other 
catagory.

>I can email you a few of my Greek spam if you are interested.
>  
>
At present I don't have time to check them, I'm quite busy on a few 
other rules and non-sa related work. I'll keep this in mind for the future.

>thank you for taking the time to run the tests, much appriciated.
>
You're welcome. if you need further masschecks you can always email me, 
I'll run them in between my own tests.

Jesse




Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Dimitrios <se...@altered.com>.
On Thu, 08 Jul 2004 11:42:13 +0200 "Jesse Houwing" <j....@rulesemporium.com> wrote:

>      39        0       39    0.000   0.00   1.00  GR_DOMAIN_ARGOSOFT
>       1        0        1    0.000   0.00   1.50  GR_DOMAIN_M5MAILER4


interesting results.

obviously you don't have any Greek-specific spam, so these rules pretty much
didn't hit anything specific.

its also interesting that you've got ham that hit the ArgoSoft and Mach5 rules,
in my experience ArgoSoft is a well known util used on exploitable systems
for repote spamming (open relay smtp), Mach5 Mailer is also a spammer tool
which you can buy from several websites.

I can email you a few of my Greek spam if you are interested.

thank you for taking the time to run the tests, much appriciated.

Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Jesse Houwing <j....@rulesemporium.com>.
Dimitrios wrote:

>On Thu, 08 Jul 2004 11:09:51 +0200 "Jesse Houwing" <j....@rulesemporium.com> wrote:
>
>  
>
>>I'm running your rules as we speak.
>>    
>>
>
>thank you, much appriciated.
>
>  
>
Results attached.

Jesse

Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Dimitrios <se...@altered.com>.
On Thu, 08 Jul 2004 11:09:51 +0200 "Jesse Houwing" <j....@rulesemporium.com> wrote:

> I'm running your rules as we speak.

thank you, much appriciated.

Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Jesse Houwing <j....@rulesemporium.com>.
Dimitrios wrote:

>On Wed, 7 Jul 2004 18:49:16 -0700 "Bob Menschel" <Ro...@menschel.net> wrote:
>
>  
>
>>If you can put together a good argument in favor for dropping the rule
>>(something more substantial than "I think they're too broad"), we'll
>>gladly consider your argument. SARE is always open for improvement.
>>    
>>
>
>first of all, thanks for taking the time to answer in detail.
>
>i dont have a specific argument against those two rules, other than
>what i've already said, i just see them hit too many legitimate
>emails. thus, i've disabled them so no harm done.
>
>on a side note, can you please take a look at my custom rules
>which i've published in the SARE forums? if you have some spare
>time, i'd like you to apply the same test on them, of how many
>spam/ham they hit.
>
>  
>
I'm running your rules as we speak.

Jesse




Re: 70_sare_ratware.cf hits legitimate email clients

Posted by Dimitrios <se...@altered.com>.
On Wed, 7 Jul 2004 18:49:16 -0700 "Bob Menschel" <Ro...@menschel.net> wrote:

> If you can put together a good argument in favor for dropping the rule
> (something more substantial than "I think they're too broad"), we'll
> gladly consider your argument. SARE is always open for improvement.

first of all, thanks for taking the time to answer in detail.

i dont have a specific argument against those two rules, other than
what i've already said, i just see them hit too many legitimate
emails. thus, i've disabled them so no harm done.

on a side note, can you please take a look at my custom rules
which i've published in the SARE forums? if you have some spare
time, i'd like you to apply the same test on them, of how many
spam/ham they hit.