You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Joe Emenaker <jo...@emenaker.com> on 2005/02/24 23:20:32 UTC

Any tools to gauge bayes accuracy?

Before I actually write this, I'll aks to see if someone already has 
done it.

On my imap server, I've got two different trash folders, one for ham, 
one for spam. Nothing new there.

However, on the hour, I've got a script that runs sa-learn on them and 
records three things for each message:
  - The overall spam score
  - The BAYES_XX number
  - Whether the user marked it as spam or ham

Originally, I was using this to fine-tune my spam-threshold. However, 
since I've been building my bayes db for over a year now, it has become 
very accurate.

What I want now is a script that can:
 A) Find some "optimum" spam-threshold based on FP or FN rate. (I've 
already got that)
 B) Compare this with the BAYES_XX values for the various spams/hams 
and, if the Bayes values have a higher correlation with what the *user* 
considers spam/ham, suggest different scoring values for the BAYES_XX hits.

In other words, I want a script that doesn't just auto-tune a user's 
spam-threshold, but the bayes scoring as well as the bayes db gets 
better and better.

Anybody done something like this?

- Joe