You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/06/09 20:44:14 UTC

[Bug 5257] get better autolearning thresholds

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5257


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |
            Summary|[review] push out           |get better autolearning
                   |autolearning thresholds for |thresholds
                   |3.2.0                       |
  Status Whiteboard|go                          |
   Target Milestone|3.2.1                       |3.2.2




------- Additional Comments From sidney@sidney.com  2007-06-09 11:44 -------
I'm reopening this because if there was a reason to open this in the first
place, then that reason still exists now that we reverted what was supposed to
fix it.

I think that we should consider how to have an adaptive autolearning threshold
based on sampling a configurable percentage of the best configurable percentage
of the ham and spam. To clarify: Identify the threshold score that gives us the
lowest scoring X% of the ham, then autolearn Y% of those hams. X is set at a
value which is unlikely to result in spam being learned as ham. Y is
configurable in case the volume of mail is too high to learn everything that is
below the threshold, but allows us to learn a representative sample of ham, not
just the very lowest scoring. That protects against an effect such as all mail
of a certain type triggering a 1.0 score rule and then Bayes incorrectly
learning that mail of that type is always spam.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.