You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Roger Taranto <ro...@rogflies.com> on 2006/11/15 04:54:06 UTC

Bayes expiration question

After an sa-learn --force-expire finishes, there are a couple of
interesting (I think) statistics printed:
token frequency: 1-occurence tokens: 62.85%
token frequency: less than 8 occurrences: 26.36%
I checked the documentation but couldn't find anything on this output. 
What do these two lines mean, and what useful tweaking can be done based
upon the output?

-Roger

Re: Bayes expiration question

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Nov 14, 2006 at 07:54:06PM -0800, Roger Taranto wrote:
> token frequency: 1-occurence tokens: 62.85%
> token frequency: less than 8 occurrences: 26.36%
> What do these two lines mean ...

The first says that 62.85% of your tokens only were ever learned once,
and another 26.36% were learned < 8 times as ham and/or spam (ie: the
count for ham and spam are under 8 each).  So 89.21% of your tokens have
been seen relatively infrequently.

-- 
Randomly Selected Tagline:
"I'm not a guitar -- stop picking on me!"

Re: Bayes expiration question

Posted by Matt Kettler <mk...@verizon.net>.
Roger Taranto wrote:
> After an sa-learn --force-expire finishes, there are a couple of
> interesting (I think) statistics printed:
> token frequency: 1-occurence tokens: 62.85%
> token frequency: less than 8 occurrences: 26.36%
> I checked the documentation but couldn't find anything on this output. 
> What do these two lines mean,
It means that 62.85% of your bayes tokens have only been seen in trained
messages once.
another 26.36% is tokens that has only been seen more than once, but
less than 8 times in trained messages.

This suggests your bayes DB is probably fairly young (less than 6 months
old). However, it's not detrimental in any way.
>  and what useful tweaking can be done based
> upon the output?
>   
None really..