You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/28 05:38:00 UTC

Re[2]: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam Assassin Rules Project")

Hello Justin,

Wednesday, July 27, 2005, 1:25:16 PM, Chris wrote:

>> Bear in mind, the SARE ruleset is not the only filter in the world
>> that is attempting to catch that spam.   AOL, Yahoo!, Hotmail,
>> GMail, Brightmail, etc. etc. are also attempting to catch it, and
>> the spammer is also mutating his spam to evade *them*.   

CS> Don't get me started on where *those* people got some of *their*
CS> rules from! Some of *those* people never even bothered to rename
CS> the rules!

That doesn't bother me so much -- the reason we publish them is so
they can be used, and I stopped expecting corporate America to give
credit to anyone other than their CEOs long ago.

CS> Ahhh...now I understand why you sent this. I got confused. I
CS> didn't read this email first. I would consider this a bad rule to
CS> go by. Why?  

CS> This IMHO is more a ratware flag. Spammers, more likely sock
CS> puppets, don't understand or bother with this as much as the
CS> easier 'body content' stuff.

Justin, could you repeat a mass-check and that analysis on this rule,
which I'm willing to sacrifice for the sake of science? Not
necessarily now, but a month or two from now?

header    SARE_SUBJ_MED_USE        Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
describe  SARE_SUBJ_MED_USE        Spam topic found in subject
score     SARE_SUBJ_MED_USE        1.666
#hist     SARE_SUBJ_MED_USE        Bob Menschel, May 14 2005
#counts   SARE_SUBJ_MED_USE        208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
#max      SARE_SUBJ_MED_USE        253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
#counts   SARE_SUBJ_MED_USE        2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
#counts   SARE_SUBJ_MED_USE        0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
#counts   SARE_SUBJ_MED_USE        108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
#counts   SARE_SUBJ_MED_USE        1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05

This rule was developed just over two months ago, flagging the emails
whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
to each other. It's not a big hitter, but it's reliable and effective
in its own way.

I don't expect too much of a drop in spam matching this rule in the
two months since it appeared in our SARE rule set. It was a fairly
quiet addition.

But now that aware spammers reading this list know that if they tie a
drug name to its symptom, or to its function, or to its companion
drugs, we will catch it (and yes, we do add new patterns to this rule
as we find them), I expect most of those will find alternate ways to
word their subjects to avoid this category of pattern, and by the end
of August the hit rates on this rule should decrease significantly.

I would be interested in seeing if that expectation matches reality.

Bob Menschel