You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2005/07/28 05:38:00 UTC
Re[2]: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam Assassin Rules Project")
Hello Justin,
Wednesday, July 27, 2005, 1:25:16 PM, Chris wrote:
>> Bear in mind, the SARE ruleset is not the only filter in the world
>> that is attempting to catch that spam. AOL, Yahoo!, Hotmail,
>> GMail, Brightmail, etc. etc. are also attempting to catch it, and
>> the spammer is also mutating his spam to evade *them*.
CS> Don't get me started on where *those* people got some of *their*
CS> rules from! Some of *those* people never even bothered to rename
CS> the rules!
That doesn't bother me so much -- the reason we publish them is so
they can be used, and I stopped expecting corporate America to give
credit to anyone other than their CEOs long ago.
CS> Ahhh...now I understand why you sent this. I got confused. I
CS> didn't read this email first. I would consider this a bad rule to
CS> go by. Why?
CS> This IMHO is more a ratware flag. Spammers, more likely sock
CS> puppets, don't understand or bother with this as much as the
CS> easier 'body content' stuff.
Justin, could you repeat a mass-check and that analysis on this rule,
which I'm willing to sacrifice for the sake of science? Not
necessarily now, but a month or two from now?
header SARE_SUBJ_MED_USE Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
describe SARE_SUBJ_MED_USE Spam topic found in subject
score SARE_SUBJ_MED_USE 1.666
#hist SARE_SUBJ_MED_USE Bob Menschel, May 14 2005
#counts SARE_SUBJ_MED_USE 208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
#max SARE_SUBJ_MED_USE 253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
#counts SARE_SUBJ_MED_USE 2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
#counts SARE_SUBJ_MED_USE 0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
#counts SARE_SUBJ_MED_USE 108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
#counts SARE_SUBJ_MED_USE 1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05
This rule was developed just over two months ago, flagging the emails
whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
to each other. It's not a big hitter, but it's reliable and effective
in its own way.
I don't expect too much of a drop in spam matching this rule in the
two months since it appeared in our SARE rule set. It was a fairly
quiet addition.
But now that aware spammers reading this list know that if they tie a
drug name to its symptom, or to its function, or to its companion
drugs, we will catch it (and yes, we do add new patterns to this rule
as we find them), I expect most of those will find alternate ways to
word their subjects to avoid this category of pattern, and by the end
of August the hit rates on this rule should decrease significantly.
I would be interested in seeing if that expectation matches reality.
Bob Menschel