You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Buttrick <db...@geekforce.com> on 2005/12/03 23:37:47 UTC

Struggling with sa-learn and scoring behavior

I have Postfix, spampd, Cyrus IMAP.
I have two user mailboxes Learn Ham and Learn Spam which are fed to  
sa-learn via cron and fetchmail.

I pick up the message with fetchmail, and pipe that to sa-learn,  
which pipes to spamc then deliver:

/usr/bin/fetchmail -a -s -n -p IMAP -u dbuttric --auth password -- 
folder 'INBOX.Learn Spam' -m 'bash -c "/usr/bin/tee >(/usr/bin/sa- 
learn --spam --single \
                  > /dev/null)|/usr/bin/spamc|/usr/lib/cyrus-imapd/ 
deliver dbuttric"' localhost.localdomain

What i'm seeing is this:
I drop a false negative the 'Learn SPAM' folder, and everything seems  
to go fine.

But the spam score AFTER sa-learn is LESS than before.

Isn't sa-learn --spam supposed to weight that message higher?

Does sa-learn like to pipe to spamc? Or does sa-learn take some time  
to commit its work to the database?
Right now, I'm imagining a situation where sa-learn takes 200 cycles  
longer to complete its task, so the data is not in the Bayes db when  
spamc gets ahold of the message...

If anyone has any insight in this, it is greatly appreciated.

Thanks

David

Re: Struggling with sa-learn and scoring behavior

Posted by Matt Kettler <mk...@comcast.net>.
At 05:37 PM 12/3/2005, David Buttrick wrote:

>But the spam score AFTER sa-learn is LESS than before.
>
>Isn't sa-learn --spam supposed to weight that message higher?

NO. sa-learn --spam is supposed to weight the BAYES score higher.

Now, if you test the message, then feed it to sa-learn --spam, then test it 
again, the score should be higher.

However, if any significant time has passed between the two tests, on the 
order of an hour, then comparing the final scores, or even the bayes 
scores, is pointless. During that hour or more of time dozens of messages 
could have been learned, RBL listings changed, etc.

Also, another thing to check for is if you're comparing against 
delivery-time scanning scores, make sure you're doing all this as the same 
user your mail gets scanned as. This is not neccesarily the same user who 
mail is addressed to. If you use spamc/spamd it is definitely NOT root 
(spamd will setuid itself to nobody if invoked to scan mail for root's userID)

>Does sa-learn like to pipe to spamc?

No.

>Or does sa-learn take some time  to commit its work to the database?

If you're using the learn to journal option, yes. It will be delayed until 
the next journal sync. Otherwise, no.