You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2004/09/03 22:16:00 UTC

Re[2]: shifting the midpoint between the average spam and average ham

Hello Joe,

Friday, September 3, 2004, 7:01:12 AM, you wrote:

 >> why do you need to alter the average scores of ham/spam?

JF> What a horrible horrible mess if we can't!

Sorry, I don't understand.

JF> One example:
JF> All of my users have set their "optimal" spam thresholds to some number
JF> between 0.0 and 10.0.

Good.  Mine is set at 9.0 for all three domains I manage.

JF> If the SA developers correctly shift around test scores, add new and/or
JF> improved algorithms, etc., and I need to take advantage of the latest,
JF> greatest technology and upgrade to the latest version of SA, then 
JF> without such a mechanism, all of my users' spam threshold settings (that
JF> they had previously spent a lot of hopeful time setting) will be totally
JF> off  the mark and are all of a sudden likely to miss all kinds of 
JF> legitimate email messages! i.e., kill me!

Why?  It hasn't happened here.

My requirements are that almost all non-spam message score below the
threshold, and almost all spam messages score above the threshold.

It happens, with 99.98% of all spam scoring above 9, and all ham but a
handful a year scoring below 9.

It's been that way, reliable and stable, through the conversion from 2.5x
to 2.6x, where all sorts of rule scores got changed, and I expect it to
continue working through the 2.6x to 3.0.x change, where the rule scores
are changed. (I'm already mass-checking against 3.0.0, and have no
problems.)

I don't care whether my non-spam mean is 8, 5, 1, -1, or -20, as long as
I continue getting fewer than 0.0001% false positives.

I don't care whether my spam mean is 10, 15, 25, 50, or 100, as long as I
maintain that 99.98% accuracy rate (and hopefully can improve on it).

I don't care whether the mean of means is 5, 9, or 15, as long as the
system works.

I don't see why the mean of means would have any impact on this -- what I
care about is the shape and distribution -- there should be (almost) no
overlap at the 9.0 threshold. In my system there isn't. So I'm happy.

Bob Menschel