You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Daniel Quinlan <qu...@pathname.com> on 2004/06/11 08:49:34 UTC

TCR lambda of 5 is too low

I think a TCR lambda of 5 is too low for us.  This means that we
consider 5 FNs to have about the same "cost" as 1 FP, right (reference:
http://www.ics.forth.gr/~potamias/mlnia/paper_2.pdf)?  I think we have
managed okay until now with using such a small value because the score
optimizer hasn't really changed in terms of balancing FPs vs. FNs until
now.

I think the value should be somewhere between 10 and 500.  I'm using 50
for the moment.

The balance is all wrong in the perceptron (too many FPs per FN), but I
believe I found a reasonably good way to fix it (having the perceptron
optimize around a lower threshold than 5.0).  Using a lambda of 5.0, I
can't really prove it, but when I eyeballed these FP/FN numbers, they
seemed much better to me and *are* better with a TCR of 50 (which I
think is closer away).

Another data point, Craig Hughes used to talk about having a FP-to-FN
ratio of 100 as a goal.  I think a lambda of 100 is closer to what we
want than 5.  I realize the Androutsopoulos papers seem to imply a lower
number is correct (although I could make a case that they actually don't
because foldering is actually worse than sending TMDA-style bounces
**once your accuracy reaches the level we're now at**), but I think we
need to go with our gut here until someone whips out some real economics
research.  :-)

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/