You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/07/27 20:25:18 UTC

rule secrecy, spammer evasion (was Re: PROPOSAL: create "SpamAssassin Rules Project")

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Chris Santerre writes:
> > I'd like to see the data that supports this claim. I'm really
> > skeptical.
> 
> Whens the last time you got a hit on Mr_Wiggly ruleset? 

Bear in mind, the SARE ruleset is not the only filter in the world
that is attempting to catch that spam.   AOL, Yahoo!, Hotmail, GMail,
Brightmail, etc. etc. are also attempting to catch it, and the
spammer is also mutating his spam to evade *them*.

- From all the research I've read and people I've talked to about this, the
spammers are a *LOT* more concerned with evading *those* filters than they
are about piddly little SpamAssassin.  Especially the AOL case -- some
spammers are dedicated 7 days a week to getting past that single ISP's
filters.

> We never saved data on this. But if you ask ANY SARE member, they will
> backup this claim. Or better yet, go ahead and start a new rule discussion
> in the SATALK list. Pick a spam flag and go for it. See how long it takes
> for that flag to go bye bye ;) 

OK, let's pick one ;)  From the top hitters on my corpus in the
last mass-check:

 12.063  17.4637   0.0000    1.000   0.98    4.14  MIME_BOUND_DD_DIGITS

grep MIME_BOUND_DD_DIGITS spam.log | perl -pe \
        's/^.*\btime=//; s/,.*$//;' > times

gnuplot
gnuplot> set terminal png
gnuplot> set output "dd_digits.png"
gnuplot> plot "times" using 1:0

result: http://taint.org/xfer/2005/dd_digits.png

the horiz axis is the time values with earlier being at the left
(specifically Jan 2004) and May 2005 at the right; the vertical axis
increments by one for every datum in the file.

In other words, it's a plot of hits over time, based on the time of the
message it was seen in.

If the frequency of the rule never changed, you'd expect to see a
straight line from bottom-left to top-right.   As it stands, we
see a shallow take-off starting in Jan 2004, then it suddenly ramps
up (presumably this is when the spamware was released outside of beta
or similar).

That graph is with the benefit of hindsight.  In fact, the rule wasn't
created in Jan 2004, for obvious reasons -- so let's find the real history
of the rule and figure out when it was invented.  (this is the hard
work part!)

"svn blame" on rules/20_head_tests.cf shows that MIME_BOUND_DD_DIGITS was
last changed in svn in r20201, 14 months ago, in May 2004; that change is
at
http://svn.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/rules/20_head_tests.cf?rev=20201&r1=20178&r2=20201
and was a rename of the rule from CT_BOUND_DDNUM.

looking at
http://svn.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/rules/20_head_tests.cf?rev=20201&view=log
the prev change on that line was 20178, which was the promotion of the
rule.

According to
http://svn.apache.org/viewcvs.cgi/incubator/spamassassin/trunk/rules/70_testing.cf?rev=20178&view=diff&r1=20178&r2=20177&p1=incubator/spamassassin/trunk/rules/70_testing.cf&p2=/incubator/spamassassin/trunk/rules/70_testing.cf
, before that, it was called T_CT_BOUND_DDNUM, and had come directly from
bug 3396, where Bob Menschel had reported it.   NB: in a public forum, in
bugzilla, and with mails flying to this dev list as a result. ;)

May 15 2004 is marked on the graph.  As you can see, it's actually
*before* the explosion in hits on that rule -- if anything, publishing
the rule to bz and getting it into SVN caused the spam using that
feature to explode in popularity ;)

It has trailed off in the year since then, but I suspect that's not
in response to the rule.

Perhaps it's a side effect of aiming at text patterns (which the spammers
can change) instead of structural patterns (which the spamware coders
*supplying* the spammers have to change).   But as this example
demonstrates, super-efficient spammer evasion certainly is not happening
with *all* rules.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC59GOMJF5cimLx9ARAmasAJ9DyErqGg6TOVtBfBY7Ij533XHBOwCdEMTw
IMFn0JXhJ+QAPosRGEdOA5Y=
=SKEs
-----END PGP SIGNATURE-----