You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by R McGlue <r....@qub.ac.uk> on 2005/03/11 12:40:36 UTC

imbalance in bayes numbers

how much will the following imbalance skew the bayes algorithms (if at all)

bash-2.03$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      54265          0  non-token data: nspam
0.000          0     206342          0  non-token data: nham
0.000          0     250698          0  non-token data: ntokens
0.000          0 1110469760          0  non-token data: oldest atime
0.000          0 1110541102          0  non-token data: newest atime
0.000          0 1110540952          0  non-token data: last journal 
sync atime
0.000          0 1110513186          0  non-token data: last expiry atime
0.000          0      43200          0  non-token data: last expire 
atime delta
0.000          0     197193          0  non-token data: last expire 
reduction count

i take it this is a standard snapshot more ham than spam...

ronan

Re: imbalance in bayes numbers

Posted by Matt Kettler <mk...@comcast.net>.
At 06:40 AM 3/11/2005, R McGlue wrote:
>how much will the following imbalance skew the bayes algorithms (if at all)

Very little.. It will bias the scores very slightly towards higher bayes 
scores, but the chi-squared combining tends to make this effect not very 
noticeable unless the training imballance gets severe.