You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Chris Santerre <cs...@MerchantsOverseas.com> on 2005/07/27 22:25:16 UTC

RE: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam Assassin Rules Project")

> -----Original Message-----
> From: jm@jmason.org [mailto:jm@jmason.org]
> Sent: Wednesday, July 27, 2005 2:25 PM
> To: Chris Santerre
> Cc: 'Duncan Findlay'; dev@spamassassin.apache.org
> Subject: rule secrecy, spammer evasion (was Re: PROPOSAL: create
> "SpamAssassin Rules Project")
> 
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> Chris Santerre writes:
> > > I'd like to see the data that supports this claim. I'm really
> > > skeptical.
> > 
> > Whens the last time you got a hit on Mr_Wiggly ruleset? 
> 
> Bear in mind, the SARE ruleset is not the only filter in the world
> that is attempting to catch that spam.   AOL, Yahoo!, Hotmail, GMail,
> Brightmail, etc. etc. are also attempting to catch it, and the
> spammer is also mutating his spam to evade *them*.

Don't get me started on where *those* people got some of *their* rules from!
Some of *those* people never even bothered to rename the rules!

> 
> - From all the research I've read and people I've talked to 
> about this, the
> spammers are a *LOT* more concerned with evading *those* 
> filters than they
> are about piddly little SpamAssassin.  Especially the AOL case -- some
> spammers are dedicated 7 days a week to getting past that single ISP's
> filters.

Which is why SA retains such a great hit rate weeks after a release????????

> 
> > We never saved data on this. But if you ask ANY SARE 
> member, they will
> > backup this claim. Or better yet, go ahead and start a new 
> rule discussion
> > in the SATALK list. Pick a spam flag and go for it. See how 
> long it takes
> > for that flag to go bye bye ;) 
> 
> OK, let's pick one ;)  From the top hitters on my corpus in the
> last mass-check:
> 
>  12.063  17.4637   0.0000    1.000   0.98    4.14  
> MIME_BOUND_DD_DIGITS
> 
> grep MIME_BOUND_DD_DIGITS spam.log | perl -pe \
>         's/^.*\btime=//; s/,.*$//;' > times
> 

Ahhh...now I understand why you sent this. I got confused. I didn't read
this email first. I would consider this a bad rule to go by. Why?

This IMHO is more a ratware flag. Spammers, more likely sock puppets, don't
understand or bother with this as much as the easier 'body content' stuff. 

So for instance if you write a rule looking for the phrase "buy m0rtgag3s
h3r3", Mr Sockpuppet can easily understand that aspect and change his body
payload to avoid. 

But I doubt many will understand the ratware setup of a mime boundry.

--Chris

Re[2]: rule secrecy, spammer evasion (was Re: PROPOSAL: create "Spam Assassin Rules Project")

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Justin,

Wednesday, July 27, 2005, 1:25:16 PM, Chris wrote:

>> Bear in mind, the SARE ruleset is not the only filter in the world
>> that is attempting to catch that spam.   AOL, Yahoo!, Hotmail,
>> GMail, Brightmail, etc. etc. are also attempting to catch it, and
>> the spammer is also mutating his spam to evade *them*.   

CS> Don't get me started on where *those* people got some of *their*
CS> rules from! Some of *those* people never even bothered to rename
CS> the rules!

That doesn't bother me so much -- the reason we publish them is so
they can be used, and I stopped expecting corporate America to give
credit to anyone other than their CEOs long ago.

CS> Ahhh...now I understand why you sent this. I got confused. I
CS> didn't read this email first. I would consider this a bad rule to
CS> go by. Why?  

CS> This IMHO is more a ratware flag. Spammers, more likely sock
CS> puppets, don't understand or bother with this as much as the
CS> easier 'body content' stuff.

Justin, could you repeat a mass-check and that analysis on this rule,
which I'm willing to sacrifice for the sake of science? Not
necessarily now, but a month or two from now?

header    SARE_SUBJ_MED_USE        Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
describe  SARE_SUBJ_MED_USE        Spam topic found in subject
score     SARE_SUBJ_MED_USE        1.666
#hist     SARE_SUBJ_MED_USE        Bob Menschel, May 14 2005
#counts   SARE_SUBJ_MED_USE        208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
#max      SARE_SUBJ_MED_USE        253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
#counts   SARE_SUBJ_MED_USE        2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
#counts   SARE_SUBJ_MED_USE        0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
#counts   SARE_SUBJ_MED_USE        108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
#counts   SARE_SUBJ_MED_USE        1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05

This rule was developed just over two months ago, flagging the emails
whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
to each other. It's not a big hitter, but it's reliable and effective
in its own way.

I don't expect too much of a drop in spam matching this rule in the
two months since it appeared in our SARE rule set. It was a fairly
quiet addition.

But now that aware spammers reading this list know that if they tie a
drug name to its symptom, or to its function, or to its companion
drugs, we will catch it (and yes, we do add new patterns to this rule
as we find them), I expect most of those will find alternate ways to
word their subjects to avoid this category of pattern, and by the end
of August the hit rates on this rule should decrease significantly.

I would be interested in seeing if that expectation matches reality.

Bob Menschel