You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Chris Santerre <cs...@MerchantsOverseas.com> on 2005/07/27 22:25:16 UTC
RE: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam
Assassin Rules Project")
> -----Original Message-----
> From: jm@jmason.org [mailto:jm@jmason.org]
> Sent: Wednesday, July 27, 2005 2:25 PM
> To: Chris Santerre
> Cc: 'Duncan Findlay'; dev@spamassassin.apache.org
> Subject: rule secrecy, spammer evasion (was Re: PROPOSAL: create
> "SpamAssassin Rules Project")
>
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Chris Santerre writes:
> > > I'd like to see the data that supports this claim. I'm really
> > > skeptical.
> >
> > Whens the last time you got a hit on Mr_Wiggly ruleset?
>
> Bear in mind, the SARE ruleset is not the only filter in the world
> that is attempting to catch that spam. AOL, Yahoo!, Hotmail, GMail,
> Brightmail, etc. etc. are also attempting to catch it, and the
> spammer is also mutating his spam to evade *them*.
Don't get me started on where *those* people got some of *their* rules from!
Some of *those* people never even bothered to rename the rules!
>
> - From all the research I've read and people I've talked to
> about this, the
> spammers are a *LOT* more concerned with evading *those*
> filters than they
> are about piddly little SpamAssassin. Especially the AOL case -- some
> spammers are dedicated 7 days a week to getting past that single ISP's
> filters.
Which is why SA retains such a great hit rate weeks after a release????????
>
> > We never saved data on this. But if you ask ANY SARE
> member, they will
> > backup this claim. Or better yet, go ahead and start a new
> rule discussion
> > in the SATALK list. Pick a spam flag and go for it. See how
> long it takes
> > for that flag to go bye bye ;)
>
> OK, let's pick one ;) From the top hitters on my corpus in the
> last mass-check:
>
> 12.063 17.4637 0.0000 1.000 0.98 4.14
> MIME_BOUND_DD_DIGITS
>
> grep MIME_BOUND_DD_DIGITS spam.log | perl -pe \
> 's/^.*\btime=//; s/,.*$//;' > times
>
Ahhh...now I understand why you sent this. I got confused. I didn't read
this email first. I would consider this a bad rule to go by. Why?
This IMHO is more a ratware flag. Spammers, more likely sock puppets, don't
understand or bother with this as much as the easier 'body content' stuff.
So for instance if you write a rule looking for the phrase "buy m0rtgag3s
h3r3", Mr Sockpuppet can easily understand that aspect and change his body
payload to avoid.
But I doubt many will understand the ratware setup of a mime boundry.
--Chris
Re[2]: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam Assassin Rules Project")
Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Justin,
Wednesday, July 27, 2005, 1:25:16 PM, Chris wrote:
>> Bear in mind, the SARE ruleset is not the only filter in the world
>> that is attempting to catch that spam. AOL, Yahoo!, Hotmail,
>> GMail, Brightmail, etc. etc. are also attempting to catch it, and
>> the spammer is also mutating his spam to evade *them*.
CS> Don't get me started on where *those* people got some of *their*
CS> rules from! Some of *those* people never even bothered to rename
CS> the rules!
That doesn't bother me so much -- the reason we publish them is so
they can be used, and I stopped expecting corporate America to give
credit to anyone other than their CEOs long ago.
CS> Ahhh...now I understand why you sent this. I got confused. I
CS> didn't read this email first. I would consider this a bad rule to
CS> go by. Why?
CS> This IMHO is more a ratware flag. Spammers, more likely sock
CS> puppets, don't understand or bother with this as much as the
CS> easier 'body content' stuff.
Justin, could you repeat a mass-check and that analysis on this rule,
which I'm willing to sacrifice for the sake of science? Not
necessarily now, but a month or two from now?
header SARE_SUBJ_MED_USE Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
describe SARE_SUBJ_MED_USE Spam topic found in subject
score SARE_SUBJ_MED_USE 1.666
#hist SARE_SUBJ_MED_USE Bob Menschel, May 14 2005
#counts SARE_SUBJ_MED_USE 208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
#max SARE_SUBJ_MED_USE 253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
#counts SARE_SUBJ_MED_USE 2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
#counts SARE_SUBJ_MED_USE 0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
#counts SARE_SUBJ_MED_USE 108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
#counts SARE_SUBJ_MED_USE 1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05
This rule was developed just over two months ago, flagging the emails
whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
to each other. It's not a big hitter, but it's reliable and effective
in its own way.
I don't expect too much of a drop in spam matching this rule in the
two months since it appeared in our SARE rule set. It was a fairly
quiet addition.
But now that aware spammers reading this list know that if they tie a
drug name to its symptom, or to its function, or to its companion
drugs, we will catch it (and yes, we do add new patterns to this rule
as we find them), I expect most of those will find alternate ways to
word their subjects to avoid this category of pattern, and by the end
of August the hit rates on this rule should decrease significantly.
I would be interested in seeing if that expectation matches reality.
Bob Menschel