You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Nels Lindquist <nl...@maei.ca> on 2005/10/11 22:51:01 UTC

Problems with sa-learn and Fetchmail

Hi there.

I'm trying to set up an IMAP based Bayesian training system using 
fetchmail as per the RemoteIMAPFolder and SingleUserUNIXInstall 
sections of the SpamAssassin wiki.

I'm running into difficulty with messages which have been marked up 
with report_safe by spamassassin.  When I retrieve such messages with 
Fetchmail and feed them to sa-learn directly, the markup is not 
detected and removed and the messages are learned improperly.

Since I'm testing this with Dovecot using Maildir-based folders, I 
was able to change into the appropriate Maildir directory and run sa-
learn on the mail messages directly from the filesystem.  When I do 
that, the SA markup is properly detected and removed prior to 
learning.  Similarly, if I tell fetchmail to dump a message to a 
textfile instead of directly to sa-learn, then the resulting textfile 
is identical to the Maildir mail message (assuming I use the --
invisible option for fetchmail), and sending it to sa-learn results 
in the SA markup once again being properly detected and removed prior 
to learning.

The problem then seems to be caused by the fetchmail process.  I 
noticed when using the "-v" (verbose) option with fetchmail and "sa-
learn -D --spam" that the message header and body are retrieved 
separately, and sa-learn seems to start its processing before the 
message body is retrieved from the IMAP server:

fetchmail: IMAP> A0010 FETCH 3 RFC822.HEADER
fetchmail: IMAP< * 3 FETCH (RFC822.HEADER {1262}
reading message defang@rapier.smilodon.ca:3 of 3 (1262 header octets)
fetchmail: about to deliver with: sa-learn -D --spam
#
fetchmail: IMAP< )
fetchmail: IMAP< A0010 OK Fetch completed.
fetchmail: IMAP> A0011 FETCH 3 BODY[TEXT]
[6637] dbg: logger: adding facilities: all
[6637] dbg: logger: logging level is DBG
[6637] dbg: generic: SpamAssassin version 3.1.0
[6637] dbg: config: score set 0 chosen.

[ .... lots more SA dbg lines here ... ]

*.****************.*****************.****************.****************
*.****************.*****************.*****************.***************
**
fetchmail: IMAP< )
fetchmail: IMAP< A0011 OK Fetch completed.
[6626] dbg: learn: learning spam
[6626] dbg: dns: dns_available set to yes in config file, skipping 
test
[6626] dbg: metadata: X-Spam-Relays-Trusted:
[6626] dbg: metadata: X-Spam-Relays-Untrusted:
[6626] dbg: message: ---- MIME PARSER START ----
[6626] dbg: message: main message type: text/plain
[6626] dbg: message: parsing normal part
[6626] dbg: message: added part, type: text/plain
[6626] dbg: message: ---- MIME PARSER END ----
[6626] dbg: message: no encoding detected

[ .... SA processing continues .... ]

At no point is there a "dbg: markup: removing markup" line as there 
is when I run sa-learn on the message files directly.  My theory is 
that fetchmail is feeding the message header and body as two separate 
events, and sa-learn isn't detecting them as a single message.

Any ideas?

----
Nels Lindquist <*>
Information Systems Manager
Morningstar Air Express Inc.

Re: Problems with sa-learn and Fetchmail

Posted by Michael Monnerie <m....@zmi.at>.

On Dienstag, 11. Oktober 2005 22:51 Nels Lindquist wrote:
> Any ideas?

I use:
sudo -H -u $user fetchmail -a -s -n -p IMAP --folder 'SPAM_yes' --auth 
'password' -m "formail -d -I \"From \" -a \"From \" -s >>$checkspam" 
$imapserver

Possibly you need the "From " Header? Anyway, afterwards I do:

sudo -H -u $user spamassassin -r --mbox $checkspam

and I *could* do:

formail <$checkspam -n 3 -s "tee >(spamc -u $user -L spam)|spamc -u 
$user -C report"

but that doesn't train the per-user-bayes, while calling spamassassin 
does. Works nice and as expected.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at           Tel: 0660/4156531          Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net                 Key-ID: 0x70545879