You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ian douglas <id...@w98.us> on 2007/06/08 00:09:50 UTC

ham counter not going up?

Hi all,

Using SA 3.2.0 on a shared hosting account via CPanel, with my 
sa-trainer.cgi Perl script to call sa-learn with various parameters 
which I'll get to in a second, to scan ham and spam from some Maildir 
folders.

After scanning, the Perl script calls "sa-learn --dump magic" and parses 
out the total number of spam/ham messages (nspam, nham, respectively) 
that have been processed through the bayes db's.

What's odd, is that after scanning, the number of ham messages does not 
increment. Before running the script, the last dump count said something 
to the effect of:

0.000          0         23          0  non-token data: nham

And after scanning, reports the exact same information.


The command-line calls built for scanning looks something like:

sa-learn -p /path/to/user_prefs --spam /path/to/spam/maildir/cur
sa-learn -p /path/to/user_prefs --use-ignores --ham \
   /path/to/non-spam/maildir/cur

Is the "use-ignores" flag causing the number of scanned messages not to 
go up?

I turned on some bayes debugging by adding "-D bayes" to the command 
line, and see this when scanning the ham messages a second time:

[16014] (I snipped out all references to FuzzyOCR)
[16014] dbg: bayes: tie-ing to DB file R/O \
   /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/O \
   /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB journal sync: last sync: 0
[16014] dbg: bayes: not available for scanning, only 23 ham(s) in \
   bayes DB < 200
[16014] dbg: bayes: untie-ing
[16014] dbg: learn: initializing learner
[16014] dbg: bayes: bayes journal sync starting
[16014] dbg: bayes: bayes journal sync completed
[16014] dbg: bayes: expiry starting
[16014] dbg: bayes: tie-ing to DB file R/W \
   /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/W \
   /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB expiry: tokens in DB: 30901, Expiry max size: 
150000, Oldest atime: 1178647046, Newest atime: 1181075754, Last \
   expire: 0, Current time: 1181253067
[16014] dbg: bayes: expiry completed
[16014] dbg: learn: learning ham
[16014] dbg: bayes: 
7f8c6a60b51d0a791288fc6bd85d8aaf481ffd4d@sa_generated already learnt 
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: 
c76637c22b722cac1e6f5584d17e243a68b34805@sa_generated already learnt 
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: 
97bce35eadc3de79963cb32e5043c5e99faeddb9@sa_generated already learnt 
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes: 
dfd859697e6e65a0589d61432ea50ea30d04434d@sa_generated already learnt 
correctly, not learning twice
[16014] dbg: learn: learning ham

The "learnt correctly" line is repeated for all 68 or so messages, and 
then ends with:

[16014] dbg: bayes: untie-ing
[16014] dbg: bayes: files locked, now unlocking lock
Learned tokens from 0 message(s) (68 message(s) examined)


Then doing another "dump magic" call, I still see the '23' line:

$ sa-learn --dump magic | grep nham
0.000          0         23          0  non-token data: nham


What information can I offer up, debugging or otherwise, to determine 
why the number of counted ham messages is not increasing? Or is it just 
the --use-ignores flag that's causing this?

Thanks,
Ian