You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Gregory Zornetzer <ga...@nmrfam.wisc.edu> on 2004/09/25 06:46:16 UTC

sa-learn and spam checker headers

Hi all,

Another question:
Short version: will spam report header lines from other spam filters
confuse sa-learn?

Long version:
I have a second email account that I have set to forward into my main
email account.  It turns out that this second email account has a
spam-checker on it. I've included two representative headers from the
spam filter below (with x'ed out IP addresses).

X-Spam-Report: TrustedSender=yes, SenderIP=xxx.xxx.xxx.xxx
X-Spam-PmxInfo: Server=avs-1, Version=4.7.0.111621, Antispam-Engine:
2.0.0.0,
 Antispam-Data: 2004.9.21.8, SenderIP=xxx.xxx.xxx.xxx
X-Spam-Score:
X-Spam-Report: IsSpam=no, Probability=7%, Hits=__HAS_MSGID 0, __SANE_MSGID
0
X-Spam-PmxInfo: Server=avs-6, Version=4.7.0.111621, Antispam-Engine:
2.0.0.0,
 Antispam-Data: 2004.9.21.8, SenderIP=xxx.xxx.xxx.xxx


That's an example of ham coming through the server.  Here's some headers
from a spam message:

X-Spam-Report: TrustedSender=yes, SenderIP=xxx.xxx.xxx.xxx
X-Spam-PmxInfo: Server=avs-7, Version=4.7.0.111621, Antispam-Engine:
2.0.0.0,
 Antispam-Data: 2004.9.24.4, SenderIP=xxx.xxx.xxx.xxx
X-Spam-Score: *******
X-Spam-Report: IsSpam=yes, Probability=99%, Hits=RELAY_IN_CBL 8,
 URI_CLASS_FINANCIAL_DOMAIN 8, OBFU_CLASS_FINANCIAL_MED 4,
 CTYPE_JUST_HTML 0.848, __CT 0, __CTE 0, __CTYPE_HTML 0, __CTYPE_IS_HTML
0,
 __HAS_MSGID 0, __MIME_HTML 0, __MIME_HTML_ONLY 0, __SANE_MSGID 0,
 __TAG_EXISTS_BODY 0, __TAG_EXISTS_HTML 0
X-Spam-PmxInfo: Server=avs-3, Version=4.7.0.111621, Antispam-Engine:
2.0.0.0,
 Antispam-Data: 2004.9.24.4, SenderIP=xxx.xxx.xxx.xxx


If the email has already been tagged as spam by the account's filter, I
see no reason that I should waste CPU cycles running spamassassin to check
it.  OTOH, I would like to run it through the bayesian learner.  I've
put the following lines into my .procmailrc above the invokation of
spamassassin:

:0 c : uwspam.lock
* ^X-Spam-Score: \*\*\*\*\*\*
| sa-learn --spam

:0 A
mail/spam-UW


This way, I've got a copy ( just in case the filter blows it ), and I've
run it through the filter.

Will the X-Spam-Report: lines from the other filter confuse the Bayesian
learner?  The spamassassin documentation mentions that sa-learn will
strip the markups from spamassassin, but I wouldn't expect it to strip
these markings.

Thanks,
-Greg