You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/06/09 20:44:14 UTC
[Bug 5257] get better autolearning thresholds
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5257
sidney@sidney.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|FIXED |
Summary|[review] push out |get better autolearning
|autolearning thresholds for |thresholds
|3.2.0 |
Status Whiteboard|go |
Target Milestone|3.2.1 |3.2.2
------- Additional Comments From sidney@sidney.com 2007-06-09 11:44 -------
I'm reopening this because if there was a reason to open this in the first
place, then that reason still exists now that we reverted what was supposed to
fix it.
I think that we should consider how to have an adaptive autolearning threshold
based on sampling a configurable percentage of the best configurable percentage
of the ham and spam. To clarify: Identify the threshold score that gives us the
lowest scoring X% of the ham, then autolearn Y% of those hams. X is set at a
value which is unlikely to result in spam being learned as ham. Y is
configurable in case the volume of mail is too high to learn everything that is
below the threshold, but allows us to learn a representative sample of ham, not
just the very lowest scoring. That protects against an effect such as all mail
of a certain type triggering a 1.0 score rule and then Bayes incorrectly
learning that mail of that type is always spam.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.