You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/08/18 06:18:12 UTC

[Bug 5613] New: Bayes expiry first pass should calculate ntokens

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5613

           Summary: Bayes expiry first pass should calculate ntokens
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Learner
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: felicity@apache.org


There was an issue that came in IRC where an expiry run wasn't working properly.
 Looking at the debug output, the only possibly reason was that ntokens wasn't
accurate, and sure enough "sa-learn --dump data | wc -l" validated this.

It occurred to me that an easy workaround for this is that if we're going to be
doing a first pass anyway, which requires going through all the tokens, we
should just count the number of tokens while doing so and then ignore ntokens. 
It's not going to hurt when things are working correctly, and it helps in the
odd invalid ntokens case.


[13615] dbg: bayes: expiry check keep size, 0.75 * max: 375000
[13615] dbg: bayes: token count: 14729453, final goal reduction size: 14354453
[13615] dbg: bayes: first pass? current: 1187407685, Last: 1166061803, atime:
691200, count: 5927, newdelta: 285, ratio: 2421.87497891007, period: 43200
[13615] dbg: bayes: can't use estimation method for expiry, unexpected result,
calculating optimal atime delta (first pass)
[13615] dbg: bayes: expiry max exponent: 9
[13615] dbg: bayes: atime token reduction
[13615] dbg: bayes: ======== ===============
[13615] dbg: bayes: 43200 3759141
[...]
[13615] dbg: bayes: first pass decided on 43200 for atime delta
[13615] dbg: bayes: token expiration would expire too many tokens, aborting

However:
$ sa-learn --dump data | wc -l
3778029

My commentary on how to fix it:
so I'd set "use_bayes_rules 0" and "bayes_auto_learn 0",
then "sa-learn --backup > bayes.backup"
move aside the bayes db files
then "sa-learn --restore bayes.backup"
assuming that all goes well, and "sa-learn --dump magic" looks right, remove the
config options above and let bayes go back to work



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.