You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/10/05 18:19:26 UTC
rule secrecy/spammer evasion tests, revisited

Robert Menschel writes:
> Justin, could you repeat a mass-check and that analysis on this rule,
> which I'm willing to sacrifice for the sake of science? Not
> necessarily now, but a month or two from now?
> 
> header    SARE_SUBJ_MED_USE        Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
> describe  SARE_SUBJ_MED_USE        Spam topic found in subject
> score     SARE_SUBJ_MED_USE        1.666
> #hist     SARE_SUBJ_MED_USE        Bob Menschel, May 14 2005
> #counts   SARE_SUBJ_MED_USE        208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
> #max      SARE_SUBJ_MED_USE        253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
> #counts   SARE_SUBJ_MED_USE        2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
> #counts   SARE_SUBJ_MED_USE        0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
> #counts   SARE_SUBJ_MED_USE        108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
> #counts   SARE_SUBJ_MED_USE        1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05
> 
> This rule was developed just over two months ago, flagging the emails
> whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
> to each other. It's not a big hitter, but it's reliable and effective
> in its own way.
> 
> I don't expect too much of a drop in spam matching this rule in the
> two months since it appeared in our SARE rule set. It was a fairly
> quiet addition.
> 
> But now that aware spammers reading this list know that if they tie a
> drug name to its symptom, or to its function, or to its companion
> drugs, we will catch it (and yes, we do add new patterns to this rule
> as we find them), I expect most of those will find alternate ways to
> word their subjects to avoid this category of pattern, and by the end
> of August the hit rates on this rule should decrease significantly.
> 
> I would be interested in seeing if that expectation matches reality.

Just to follow up on this (as requested).... it looks like the answer
is "no, but not in the way you were thinking".

unfortunately the test didn't display useful data -- on one hand, the hit
rate of the rule was too low in the first place, but on the other, the
rule had already stopped hitting spam 3 months before that message was
posted.  The spammers had moved on already.

My corpus got a *maximum* of 8 messages hitting this rule in one week (the
week of Apr 4). The last hit I saw on the rule was in a mail received on
Apr 23 22:44:56 2003.  Not a single hit after that date... I think that's
an inconclusive test.

Could it be they noticed the rule appearing in the ruleset, regardless
of its discussion or lack thereof?

I've attached the data anyway, if anyone wants a look -- first column is
the time value (UNIX format), second hits on the rule, third spam mails
received in total in that period.

--j.