You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2005/10/05 18:19:26 UTC
rule secrecy/spammer evasion tests, revisited
Robert Menschel writes:
> Justin, could you repeat a mass-check and that analysis on this rule,
> which I'm willing to sacrifice for the sake of science? Not
> necessarily now, but a month or two from now?
>
> header SARE_SUBJ_MED_USE Subject =~ /\w{3}\sused .+ (?:along with|combin|manage|prevent|relieve|symptom|treat)/i
> describe SARE_SUBJ_MED_USE Spam topic found in subject
> score SARE_SUBJ_MED_USE 1.666
> #hist SARE_SUBJ_MED_USE Bob Menschel, May 14 2005
> #counts SARE_SUBJ_MED_USE 208s/0h of 297244 corpus (135824s/161420h RM) 06/12/05
> #max SARE_SUBJ_MED_USE 253s/0h of 275081 corpus (134226s/140855h RM) 05/30/05
> #counts SARE_SUBJ_MED_USE 2s/0h of 5648 corpus (1019s/4629h ft) 06/04/05
> #counts SARE_SUBJ_MED_USE 0s/0h of 55803 corpus (18630s/37173h JH-3.01) 06/10/05
> #counts SARE_SUBJ_MED_USE 108s/0h of 49034 corpus (44877s/4157h MY) 06/11/05
> #counts SARE_SUBJ_MED_USE 1s/0h of 11269 corpus (6578s/4691h CT) 06/11/05
>
> This rule was developed just over two months ago, flagging the emails
> whose subjects tied drugs to symptoms, or drugs to reliefs, or drugs
> to each other. It's not a big hitter, but it's reliable and effective
> in its own way.
>
> I don't expect too much of a drop in spam matching this rule in the
> two months since it appeared in our SARE rule set. It was a fairly
> quiet addition.
>
> But now that aware spammers reading this list know that if they tie a
> drug name to its symptom, or to its function, or to its companion
> drugs, we will catch it (and yes, we do add new patterns to this rule
> as we find them), I expect most of those will find alternate ways to
> word their subjects to avoid this category of pattern, and by the end
> of August the hit rates on this rule should decrease significantly.
>
> I would be interested in seeing if that expectation matches reality.
Just to follow up on this (as requested).... it looks like the answer
is "no, but not in the way you were thinking".
unfortunately the test didn't display useful data -- on one hand, the hit
rate of the rule was too low in the first place, but on the other, the
rule had already stopped hitting spam 3 months before that message was
posted. The spammers had moved on already.
My corpus got a *maximum* of 8 messages hitting this rule in one week (the
week of Apr 4). The last hit I saw on the rule was in a mail received on
Apr 23 22:44:56 2003. Not a single hit after that date... I think that's
an inconclusive test.
Could it be they noticed the rule appearing in the ruleset, regardless
of its discussion or lack thereof?
I've attached the data anyway, if anyone wants a look -- first column is
the time value (UNIX format), second hits on the rule, third spam mails
received in total in that period.
--j.