You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Buttrick <db...@geekforce.com> on 2005/12/03 23:37:47 UTC
Struggling with sa-learn and scoring behavior
I have Postfix, spampd, Cyrus IMAP.
I have two user mailboxes Learn Ham and Learn Spam which are fed to
sa-learn via cron and fetchmail.
I pick up the message with fetchmail, and pipe that to sa-learn,
which pipes to spamc then deliver:
/usr/bin/fetchmail -a -s -n -p IMAP -u dbuttric --auth password --
folder 'INBOX.Learn Spam' -m 'bash -c "/usr/bin/tee >(/usr/bin/sa-
learn --spam --single \
> /dev/null)|/usr/bin/spamc|/usr/lib/cyrus-imapd/
deliver dbuttric"' localhost.localdomain
What i'm seeing is this:
I drop a false negative the 'Learn SPAM' folder, and everything seems
to go fine.
But the spam score AFTER sa-learn is LESS than before.
Isn't sa-learn --spam supposed to weight that message higher?
Does sa-learn like to pipe to spamc? Or does sa-learn take some time
to commit its work to the database?
Right now, I'm imagining a situation where sa-learn takes 200 cycles
longer to complete its task, so the data is not in the Bayes db when
spamc gets ahold of the message...
If anyone has any insight in this, it is greatly appreciated.
Thanks
David
Re: Struggling with sa-learn and scoring behavior
Posted by Matt Kettler <mk...@comcast.net>.
At 05:37 PM 12/3/2005, David Buttrick wrote:
>But the spam score AFTER sa-learn is LESS than before.
>
>Isn't sa-learn --spam supposed to weight that message higher?
NO. sa-learn --spam is supposed to weight the BAYES score higher.
Now, if you test the message, then feed it to sa-learn --spam, then test it
again, the score should be higher.
However, if any significant time has passed between the two tests, on the
order of an hour, then comparing the final scores, or even the bayes
scores, is pointless. During that hour or more of time dozens of messages
could have been learned, RBL listings changed, etc.
Also, another thing to check for is if you're comparing against
delivery-time scanning scores, make sure you're doing all this as the same
user your mail gets scanned as. This is not neccesarily the same user who
mail is addressed to. If you use spamc/spamd it is definitely NOT root
(spamd will setuid itself to nobody if invoked to scan mail for root's userID)
>Does sa-learn like to pipe to spamc?
No.
>Or does sa-learn take some time to commit its work to the database?
If you're using the learn to journal option, yes. It will be delayed until
the next journal sync. Otherwise, no.