You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/06/11 19:48:22 UTC

Re: TCR lambda of 5 is too low

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Bear in mind, the TCR figure that's output to the user in
"fp-fn-statistics" output is mostly useful to compare against published
algorithms, since it's the de-facto std of effectiveness in the academic
lit on spam-filtering.

But we shouldn't use it ourselves internally as an effectiveness metric,
because I don't think it's trustworthy (see below).

To remind us what they represent in Ion's papers:

    lambda=1: filing into a "spam" folder
    lambda=9: bouncing back to sender saying "your mail was spam"
    lambda=100: silent disposal

We should really be a lambda of 1, given that; but since SpamAssassin is
also used in other systems (e.g. with a system-wide quarantine,
unavailable to the end user), and because it was getting crazily-good
efficiency figures (like TCR > 100) at l=1, I picked a compromise l=5.

But if you're keen to change it, I'd say a TCR lambda of 9 would be OK.
For end-user display only though.

Regarding what's used as a balancing factor in the perceptron, use
whatever value works well, but don't consider it a TCR lambda in terms of
the figures output to the end-user.   Just keep it inside the perceptron
code.  For *that* use, 100 or even higher would be good IMO, because we
really want to avoid FPs -- in other words, *our* perception of the FP
cost is higher than Ion's assumptions.

to reiterate prev mail: I *don't* think TCR is a good single-figure
representation of spamfilter efficiency.  I used to, but since then,
I've occasionally run into results where the FP/FN figures are lousy,
but the TCR is good; generally when the corpora are out of balance
and the FP figures are high, but the FNs are "good enough" to outweigh
the crappy FP rate.

IMO a better metric would be to pick a desired FP rate, and then use
FN as a single-figure metric given that FP rate.   Or vice versa.
Basically lock down a desired FP or FN rate and allow the perceptron
to find its "best" rate for the other figure.

- --j.

Daniel Quinlan writes:
> I think a TCR lambda of 5 is too low for us.  This means that we
> consider 5 FNs to have about the same "cost" as 1 FP, right (reference:
> http://www.ics.forth.gr/~potamias/mlnia/paper_2.pdf)?  I think we have
> managed okay until now with using such a small value because the score
> optimizer hasn't really changed in terms of balancing FPs vs. FNs until
> now.
> 
> I think the value should be somewhere between 10 and 500.  I'm using 50
> for the moment.
> 
> The balance is all wrong in the perceptron (too many FPs per FN), but I
> believe I found a reasonably good way to fix it (having the perceptron
> optimize around a lower threshold than 5.0).  Using a lambda of 5.0, I
> can't really prove it, but when I eyeballed these FP/FN numbers, they
> seemed much better to me and *are* better with a TCR of 50 (which I
> think is closer away).
> 
> Another data point, Craig Hughes used to talk about having a FP-to-FN
> ratio of 100 as a goal.  I think a lambda of 100 is closer to what we
> want than 5.  I realize the Androutsopoulos papers seem to imply a lower
> number is correct (although I could make a case that they actually don't
> because foldering is actually worse than sending TMDA-style bounces
> **once your accuracy reaches the level we're now at**), but I think we
> need to go with our gut here until someone whips out some real economics
> research.  :-)
> 
> Daniel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAyfBmQTcbUG5Y7woRAgyBAJ94PGg3y0NI3jux+4i1Wi59oCK9cQCgj6wP
sEOlwAWWPtYiN0E5quz0uWw=
=5zW1
-----END PGP SIGNATURE-----


Re: TCR lambda of 5 is too low

Posted by Daniel Quinlan <qu...@pathname.com>.
jm@jmason.org (Justin Mason) writes:

> Bear in mind, the TCR figure that's output to the user in
> "fp-fn-statistics" output is mostly useful to compare against
> published algorithms, since it's the de-facto std of effectiveness in
> the academic lit on spam-filtering.

Erm, but everyone uses different lambdas and different corpora, so I'm
not sure when this type of comparison is possible.
 
> But we shouldn't use it ourselves internally as an effectiveness metric,
> because I don't think it's trustworthy (see below).
> 
> To remind us what they represent in Ion's papers:
> 
>     lambda=1: filing into a "spam" folder
>     lambda=9: bouncing back to sender saying "your mail was spam"
>     lambda=100: silent disposal
> 
> We should really be a lambda of 1, given that; but since SpamAssassin is
> also used in other systems (e.g. with a system-wide quarantine,
> unavailable to the end user), and because it was getting crazily-good
> efficiency figures (like TCR > 100) at l=1, I picked a compromise l=5.

I think the example mapping of policy to lambda number is wrong.
Clearly, 1 FP is not the same amount of pain as 1 FN when filing into
probable spam into a "spam" folder.  lambda may be especially low if
only 75% of spam is being caught with a high FP rate and you have to
check your spam folder every day, but it's much higher when you get to
SA-level accuracy.

Maybe it shouldn't be considered at all when we're doing score optimizer
work.  Maybe a better metric is needed.
 
> IMO a better metric would be to pick a desired FP rate, and then use
> FN as a single-figure metric given that FP rate.   Or vice versa.
> Basically lock down a desired FP or FN rate and allow the perceptron
> to find its "best" rate for the other figure.

I agree with that.  The perceptron is not quite there, though.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/