You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by R McGlue <r....@qub.ac.uk> on 2005/03/11 12:40:36 UTC
imbalance in bayes numbers
how much will the following imbalance skew the bayes algorithms (if at all)
bash-2.03$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 54265 0 non-token data: nspam
0.000 0 206342 0 non-token data: nham
0.000 0 250698 0 non-token data: ntokens
0.000 0 1110469760 0 non-token data: oldest atime
0.000 0 1110541102 0 non-token data: newest atime
0.000 0 1110540952 0 non-token data: last journal
sync atime
0.000 0 1110513186 0 non-token data: last expiry atime
0.000 0 43200 0 non-token data: last expire
atime delta
0.000 0 197193 0 non-token data: last expire
reduction count
i take it this is a standard snapshot more ham than spam...
ronan
Re: imbalance in bayes numbers
Posted by Matt Kettler <mk...@comcast.net>.
At 06:40 AM 3/11/2005, R McGlue wrote:
>how much will the following imbalance skew the bayes algorithms (if at all)
Very little.. It will bias the scores very slightly towards higher bayes
scores, but the chi-squared combining tends to make this effect not very
noticeable unless the training imballance gets severe.