You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2008/02/11 22:29:52 UTC

Re: test my auto-generated ruleset

Larry Nedry writes:
> I've been using Justin Mason's auto-generated rule set since mid October
> and am fairly happy with it.  Up until Jan 11, false positives averaged
> about 10% of the hits and I can live with that.
> 
> I noticed a surprising change on Jan 11, 2008.  Before that day many of the
> hits were on low scoring (< 20) spam which was very helpful.  And I would
> see many of these every day.  Since Jan 10 I've only seen 4 messages that
> hit on low scoring spam and the rest on very high scoring spam.  I don't
> get anymore FPs but as the spam scores for these messages are already
> through the roof, at the moment, the usefulness of the current rule sets
> have diminished.  Though I assume the methods for creating the rules are
> still under development and am looking forward to more improvements.
> 
> Was there a big change in the way rules were created around that time period?

well, there was a change in how my mail was collected back in November,
but nothing since then.  Also, the "scoremap" figures seem to indicate
that on the dev testing corpora, it's still predominantly hitting
low-scoring spam:

http://ruleqa.spamassassin.org/20080211-r620438-n/JM_SOUGHT_3/detail#tSCOREMAP_new
http://ruleqa.spamassassin.org/20080211-r620438-n/JM_SOUGHT_2/detail#tSCOREMAP_new
http://ruleqa.spamassassin.org/20080211-r620438-n/JM_SOUGHT_1/detail#tSCOREMAP_new

so I'm stumped...

--j.