You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Bram Mertens <br...@sofico.be> on 2007/02/25 20:11:30 UTC

Bayes DB maintenance


Hi

Like I wrote in my previous post SA's effectiveness has dropped
dramatically over the past couple of days.

I read something about "overtraining" bayes databases a while ago and was
wondering if this could be an issue.

How can I check the status of my bayes DB?  The output of sa-learn --dump
magic doesn't mean much to me.

Are there routines to run to clean up a bayes db and if so how often should
they be run?
I ran sa-learn --force-expire today but it appears to have made little
difference on the output of sa-learn --dump magic:
before:
0.000          0          3          0  non-token data: bayes db version
0.000          0      12234          0  non-token data: nspam
0.000          0     115904          0  non-token data: nham
0.000          0     179531          0  non-token data: ntokens
0.000          0 1169633453          0  non-token data: oldest atime
0.000          0 1172424260          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync
atime
0.000          0 1172399818          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire atime
delta
0.000          0       1438          0  non-token data: last expire
reduction count

after:
0.000          0          3          0  non-token data: bayes db version
0.000          0      12234          0  non-token data: nspam
0.000          0     115905          0  non-token data: nham
0.000          0     178179          0  non-token data: ntokens
0.000          0 1169659662          0  non-token data: oldest atime
0.000          0 1172426770          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal sync
atime
0.000          0 1172427375          0  non-token data: last expiry atime
0.000          0    2764800          0  non-token data: last expire atime
delta
0.000          0       1386          0  non-token data: last expire
reduction count


Would it make sense to clean (using sa-learn --clear) out the bayes db and
retrain with recent ham/spam?

Regards

Bram