You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by ian douglas <id...@w98.us> on 2007/06/11 20:41:14 UTC

ham counter not going up (2nd attempt)

(sorry if this is a dupe, had some weirdness on my end that made it look 
like my original message was never sent?)


Hi all,

Using SA 3.2.0 on a shared hosting account via CPanel, with my
sa-trainer.cgi Perl script to call sa-learn with various parameters
which I'll get to in a second, to scan ham and spam from some Maildir
folders.

After scanning, the Perl script calls "sa-learn --dump magic" and parses
out the total number of spam/ham messages (nspam, nham, respectively)
that have been processed through the bayes db's.

What's odd, is that after scanning, the number of ham messages does not
increment. Before running the script, the last dump count said something
to the effect of:

0.000          0         23          0  non-token data: nham

And after scanning, reports the exact same information.


The command-line calls built for scanning looks something like:

sa-learn -p /path/to/user_prefs --spam /path/to/spam/maildir/cur
sa-learn -p /path/to/user_prefs --use-ignores --ham \
    /path/to/non-spam/maildir/cur

Is the "use-ignores" flag causing the number of scanned messages not to
go up?

I turned on some bayes debugging by adding "-D bayes" to the command
line, and see this when scanning the ham messages a second time:

[16014] (I snipped out all references to FuzzyOCR)
[16014] dbg: bayes: tie-ing to DB file R/O \
    /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/O \
    /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB journal sync: last sync: 0
[16014] dbg: bayes: not available for scanning, only 23 ham(s) in \
    bayes DB < 200
[16014] dbg: bayes: untie-ing
[16014] dbg: learn: initializing learner
[16014] dbg: bayes: bayes journal sync starting
[16014] dbg: bayes: bayes journal sync completed
[16014] dbg: bayes: expiry starting
[16014] dbg: bayes: tie-ing to DB file R/W \
    /home/mypath/.spamassassin/bayes_toks
[16014] dbg: bayes: tie-ing to DB file R/W \
    /home/mypath/.spamassassin/bayes_seen
[16014] dbg: bayes: found bayes db version 3
[16014] dbg: bayes: DB expiry: tokens in DB: 30901, Expiry max size:
150000, Oldest atime: 1178647046, Newest atime: 1181075754, Last \
    expire: 0, Current time: 1181253067
[16014] dbg: bayes: expiry completed
[16014] dbg: learn: learning ham
[16014] dbg: bayes:
7f8c6a60b51d0a791288fc6bd85d8aaf481ffd4d@sa_generated already learnt
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes:
c76637c22b722cac1e6f5584d17e243a68b34805@sa_generated already learnt
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes:
97bce35eadc3de79963cb32e5043c5e99faeddb9@sa_generated already learnt
correctly, not learning twice
[16014] dbg: learn: learning ham
[16014] dbg: bayes:
dfd859697e6e65a0589d61432ea50ea30d04434d@sa_generated already learnt
correctly, not learning twice
[16014] dbg: learn: learning ham

The "learnt correctly" line is repeated for all 68 or so messages, and
then ends with:

[16014] dbg: bayes: untie-ing
[16014] dbg: bayes: files locked, now unlocking lock
Learned tokens from 0 message(s) (68 message(s) examined)


Then doing another "dump magic" call, I still see the '23' line:

$ sa-learn --dump magic | grep nham
0.000          0         23          0  non-token data: nham


What information can I offer up, debugging or otherwise, to determine
why the number of counted ham messages is not increasing? Or is it just
the --use-ignores flag that's causing this?

Thanks,
Ian