You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Roger Taranto <ro...@rogflies.com> on 2006/11/15 04:54:06 UTC
Bayes expiration question
After an sa-learn --force-expire finishes, there are a couple of
interesting (I think) statistics printed:
token frequency: 1-occurence tokens: 62.85%
token frequency: less than 8 occurrences: 26.36%
I checked the documentation but couldn't find anything on this output.
What do these two lines mean, and what useful tweaking can be done based
upon the output?
-Roger
Re: Bayes expiration question
Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Nov 14, 2006 at 07:54:06PM -0800, Roger Taranto wrote:
> token frequency: 1-occurence tokens: 62.85%
> token frequency: less than 8 occurrences: 26.36%
> What do these two lines mean ...
The first says that 62.85% of your tokens only were ever learned once,
and another 26.36% were learned < 8 times as ham and/or spam (ie: the
count for ham and spam are under 8 each). So 89.21% of your tokens have
been seen relatively infrequently.
--
Randomly Selected Tagline:
"I'm not a guitar -- stop picking on me!"
Re: Bayes expiration question
Posted by Matt Kettler <mk...@verizon.net>.
Roger Taranto wrote:
> After an sa-learn --force-expire finishes, there are a couple of
> interesting (I think) statistics printed:
> token frequency: 1-occurence tokens: 62.85%
> token frequency: less than 8 occurrences: 26.36%
> I checked the documentation but couldn't find anything on this output.
> What do these two lines mean,
It means that 62.85% of your bayes tokens have only been seen in trained
messages once.
another 26.36% is tokens that has only been seen more than once, but
less than 8 times in trained messages.
This suggests your bayes DB is probably fairly young (less than 6 months
old). However, it's not detrimental in any way.
> and what useful tweaking can be done based
> upon the output?
>
None really..