You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Carlo Wood <ca...@alinoe.com> on 2005/05/04 17:48:11 UTC

Problem with changing the Markup

Hi, I am having some problems with false-positives and
want to know why spamassassin is giving the rating it
gives.

In order to achieve that, I added the rule:

add_header all Scores _TESTSSCORES(,)_

To my config file.
When that didn't work, I found out that many other
things in that config file didn't work either; most
specifically -- any 'add_header' and 'rewrite_header Subject  [SA-score:_SCORE()_]'
does not work.

The config *is* processed however: required_hits and
the whitelist_from in it are working.

Moreover, I get the following results for the following commands:

(I am using fedora core 3, and spamassassin-3.0.3-3.fc3)

jolan:/etc/mail/spamassassin>spamassassin --lint
jolan:/etc/mail/spamassassin>

(No errors or warnings)

jolan:/etc/mail/spamassassin>spamassassin --lint --debug 2>&1 | grep 'config:'
debug: config: read file /etc/mail/spamassassin/init.pre
debug: config: read file /usr/share/spamassassin/10_misc.cf
debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf
debug: config: read file /usr/share/spamassassin/20_body_tests.cf
debug: config: read file /usr/share/spamassassin/20_compensate.cf
debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /usr/share/spamassassin/20_drugs.cf
debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /usr/share/spamassassin/20_head_tests.cf
debug: config: read file /usr/share/spamassassin/20_html_tests.cf
debug: config: read file /usr/share/spamassassin/20_meta_tests.cf
debug: config: read file /usr/share/spamassassin/20_phrases.cf
debug: config: read file /usr/share/spamassassin/20_porn.cf
debug: config: read file /usr/share/spamassassin/20_ratware.cf
debug: config: read file /usr/share/spamassassin/20_uri_tests.cf
debug: config: read file /usr/share/spamassassin/23_bayes.cf
debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf
debug: config: read file /usr/share/spamassassin/25_hashcash.cf
debug: config: read file /usr/share/spamassassin/25_spf.cf
debug: config: read file /usr/share/spamassassin/25_uribl.cf
debug: config: read file /usr/share/spamassassin/30_text_de.cf
debug: config: read file /usr/share/spamassassin/30_text_fr.cf
debug: config: read file /usr/share/spamassassin/30_text_nl.cf
debug: config: read file /usr/share/spamassassin/30_text_pl.cf
debug: config: read file /usr/share/spamassassin/50_scores.cf
debug: config: read file /usr/share/spamassassin/60_whitelist.cf
debug: config: read file /etc/mail/spamassassin/20_head_tests.cf
debug: config: read file /etc/mail/spamassassin/20_phrases_tests.cf
debug: config: read file /etc/mail/spamassassin/backhair.cf
debug: config: read file /etc/mail/spamassassin/local.cf
debug: config: read file /etc/mail/spamassassin/popcorn.cf
debug: config: read file /etc/mail/spamassassin/weed.cf
debug: config: read file /etc/mail/spamassassin/zz_local.cf
debug: config: read file /root/.spamassassin/user_prefs
jolan:/etc/mail/spamassassin>

Note that '/root/.spamassassin/user_prefs' is empty (everything
is commented out) and '/etc/mail/spamassassin/zz_local.cf'
is the config file that I am using in order to try to change
the markup.

The markup that I *do* get is just this:

In ham:

X-Spam-Status: No, hits=-1.1 required=4.0

And in spam:

X-Spam-Status: Yes, hits=7.4 required=4.0
X-Spam-Level: +++++++

There is no way I can seem to change this!

I am running spamassassin on the firewall (which then passes it on
to an internal machine).  It has running:

jolan:/etc/mail/spamassassin>ps aux | grep spam
nobody   26571  0.4 21.4 31564 27264 ?       Ss   16:48   0:09 /usr/bin/spamd --daemonize --max-children 4 --username=nobody
nobody   26583  1.0 22.9 33380 29172 ?       S    16:49   0:22 spamd child
nobody   26584  0.6 22.5 32772 28588 ?       S    16:49   0:14 spamd child
nobody   26585  0.8 23.1 33556 29364 ?       S    16:49   0:18 spamd child
nobody   26586  1.0 23.5 34188 29896 ?       S    16:49   0:22 spamd child
root     28268  0.0  0.1  1484  152 pts/36   R+   17:33   0:00 grep spam

jolan:/etc/mail/spamassassin>rpm -qf /usr/bin/spamd
spamassassin-3.0.3-3.fc3

spamd is started as: spamd -d -c -m5 -H
as far as I can see, though :/

I am using qmail, which uses qmail-scanner, which uses
/usr/bin/spamc (/usr/bin/qmail-scanner-queue.pl contains:

my $spamc_binary='/usr/bin/spamc';
my $spamc_options=' -c -f';

And note that,

jolan:/etc/mail/spamassassin>rpm -qf /usr/bin/spamc
spamassassin-3.0.3-3.fc3


It seems that /usr/bin/qmail-scanner-queue.pl is
responsible for the addition of the X-Spam-Status
and X-Spam-Level header... it contains:

    print QMQ " Processed in $elapsed_time secs); $findate\n";
    print QMQ "X-Spam-Status: $sa_comment\n" if ($sa_comment ne "");
    print QMQ "X-Spam-Level: $sa_level\n" if ($sa_level ne "");

which is what I see back in my mail.
It even removes other 'X-Spam-Status:' and 'X-Spam-Level:'
lines, but it shouldn't touch a 'X-Spam-Scores:' header line?!

jolan:/etc/mail/spamassassin>grep -n 'X-Spam' /usr/bin/qmail-scanner-queue.pl
1236:    print QMQ "X-Spam-Status: $sa_comment\n" if ($sa_comment ne "");
1237:    print QMQ "X-Spam-Level: $sa_level\n" if ($sa_level ne "");
1253:   #remove any X-Spam-Status/Level IFF we've set a SA value ourselves
1254:   if (($sa_comment ne "" && /^X-Spam-Status:/i) || ($sa_level ne "" && /^X-Spam-Level:/i) ) {
1629:      #X-Spam-Checker-Version: SpamAssassin 2.01
2168:   #X-Spam-Status: No, hits=2.8 required=5.0
2169:   if (/^X-Spam-Status: (Yes|No), hits=(-?[\d\.]*) required=([\d\.]*)/) {


Can someone please tell me how it is possible that the
'add_header' config lines in /etc/mail/spamassassin/zz_local.cf
have no effect?

-- 
Carlo Wood <ca...@alinoe.com>

Re: Problem with changing the Markup

Posted by Matt Kettler <mk...@evi-inc.com>.

Carlo Wood wrote:

>On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote:
>  
>
>>>In -well- every mail.  That is not too weird, since
>>>this is my domain!  Why does rate 'alinoe.com' and 'com'
>>>and 'carlo' as spammy tokens?  Is that normal?
>>>
>>>      
>>>
>>No, it's not normal.
>>
>>Have you been training your bayes using forwarded messages? 
>>
>>In general it looks like your bayes has been very heavily trained on
>>spam that was addressed To: you, and almost no nonspam messages
>>addressed To: you. This is something that could happen if you were
>>forwarding mail for training, or if you used someone elses nonspam for
>>training (and little or none of your own), but did use your own spam.
>>    
>>
>
>Yeah... the point is, I receive mail on my firewall machine.
>There are no accounts there, but I want to run spamassassin
>there so that it's cpu cycles don't bother me on my working
>machine.  However, I don't want the bayesian database to autolearn:
>I want it to only learn correctly.  So, I have auto-learn off.
>The tagged mail is then sent to another machine that sorts it
>into mailboxes with procmail.  All mail is THERE decided to be
>REALLY ham or spam (under my guidance) and is then forwarded
>back to the firewall machine (two special accounts there)
>which is then fed to the bayes.  I didn't realize that this
>didn't work.
>

That should work the way you are doing it if you're careful. My warnings
about forwarding were intended for those forwarding using a mail
client's "forward" feature, which deletes the headers and creates new ones.

However, you have to be a little careful to make sure you train both ham
and spam. In general make sure the training ratio isn't too wildly off.
1:9 should be the worst ham:spam ratio you should use with this kind of
setup.

If you're training is 99% spam on the To:carlo@domain.com address, then
SA's bayes is going to assume that carlo gets nothing but spam, and all
mail sent there will be biased a bit by this.

Re: Problem with changing the Markup

Posted by Carlo Wood <ca...@alinoe.com>.

On Wed, May 04, 2005 at 01:03:18PM -0400, Matt Kettler wrote:
> >In -well- every mail.  That is not too weird, since
> >this is my domain!  Why does rate 'alinoe.com' and 'com'
> >and 'carlo' as spammy tokens?  Is that normal?
> >
> No, it's not normal.
> 
> Have you been training your bayes using forwarded messages? 
> 
> In general it looks like your bayes has been very heavily trained on
> spam that was addressed To: you, and almost no nonspam messages
> addressed To: you. This is something that could happen if you were
> forwarding mail for training, or if you used someone elses nonspam for
> training (and little or none of your own), but did use your own spam.

Yeah... the point is, I receive mail on my firewall machine.
There are no accounts there, but I want to run spamassassin
there so that it's cpu cycles don't bother me on my working
machine.  However, I don't want the bayesian database to autolearn:
I want it to only learn correctly.  So, I have auto-learn off.
The tagged mail is then sent to another machine that sorts it
into mailboxes with procmail.  All mail is THERE decided to be
REALLY ham or spam (under my guidance) and is then forwarded
back to the firewall machine (two special accounts there)
which is then fed to the bayes.  I didn't realize that this
didn't work.

How can I solve this?  My .procmail (using a lot custom rules
decides whether or not something is spam or ham and sends it
to ham@192.168.2.1 or spam@192.168.2.1, on the firewall this
is just stored in mailboxes and further nothing.
Manually I can react to what was tagged as 'ham' by saying
that is really is spam, or visa versa, or I can tell it
to 'forget' it (I sent it to forget@192.168.2.1).

These mailboxes (ham, spam, really_ham, really_spam, forget)
are then processed once per day from a cron job.
I suppose I should first filter the headers before processing them?

Here is an example of the header of a spam as it finally ends
up on the firewall in the 'spambox' mailbox:

>From hunuxyvef@yyhmail.com Wed May 04 21:58:09 2005
Return-Path: <hu...@yyhmail.com>
Delivered-To: carlo@192.168.2.1
Received: (qmail 16434 invoked by alias); 4 May 2005 21:58:09 -0000
Delivered-To: spam@192.168.2.1
Received: (qmail 16430 invoked from network); 4 May 2005 21:58:09 -0000
Received: from ansset.ansset-jolan (HELO mail.alinoe.com) (192.168.2.2)
  by alinoe.com with SMTP; 4 May 2005 21:58:09 -0000
Received: (qmail 17609 invoked by uid 500); 4 May 2005 21:58:05 -0000
Resent-Date: 4 May 2005 21:58:05 -0000
Resent-Message-ID: <20...@mail.alinoe.com>
Resent-From: carlo@alinoe.com
Delivered-To: carlo@alinoe.com
Received: (qmail 17589 invoked from network); 4 May 2005 21:58:05 -0000
Received: from jolan.jolan-alinoe (HELO alinoe.com) (192.168.2.1)
  by mail.alinoe.com with SMTP; 4 May 2005 21:58:05 -0000
Received: (qmail 16424 invoked by uid 109); 4 May 2005 21:58:08 -0000
Received: from 24.22.13.76 by alinoe.com (envelope-from <hu...@yyhmail.com>, uid 102) with qmail-scanner-1.25
 (spamassassin: 3.0.3.
 Clear:RC:0(24.22.13.76):SA:1(33.3/4.0):.
 Processed in 4.019034 secs); 04 May 2005 21:58:08 -0000
X-Envelope-From: hunuxyvef@yyhmail.com
Received: from c-24-22-13-76.hsd1.or.comcast.net (24.22.13.76)
  by alinoe.com with SMTP; 4 May 2005 21:58:03 -0000
Received: from yyhmail.com (yyhmail-com-bk.mr.outblaze.com [205.158.62.177])
        by c-24-22-13-76.hsd1.or.comcast.net (Postfix) with ESMTP id 77P4H5V2OR
        for <ca...@alinoe.com>; Wen, 4 May 2005 23:59:03 +0000
From: Aisha Rice <hu...@yyhmail.com>
To: carlo@alinoe.com
Subject: [SA-score:33.3] She humped, and ground her body...
Date: Wen, 4 May 2005 23:59:03 +0000
MIME-Version: 1.0
Content-Type: text/plain;
        charset="Windows-1251"
Content-Transfer-Encoding: 7bit
X-Qmail-Scanner-Message-ID: <11...@alinoe.com>
X-Spam-Prev-Subject: She humped, and ground her body...
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on alinoe.com
X-Spam-Scores: BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1,
        HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867,
        RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338,
        RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263,
        URIBL_WS_SURBL=1.462
X-Spam-Level: *********************************
X-Spam-Status: Yes, score=33.3 required=4.0 tests=BAYES_99,DNS_FROM_RFC_POST,
        FAKE_OUTBLAZE_RCVD,HELO_DYNAMIC_IPADDR,INVALID_DATE,RATWARE_RCVD_PF,
        RCVD_IN_DSBL,RCVD_IN_NJABL_PROXY,RCVD_IN_SORBS_MISC,RCVD_IN_XBL,
        URIBL_OB_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
        version=3.0.3
X-Spam-Report:
        Bayes score: 1.0000 (Tokens: new, 21; hammy, 3; neutral, 9; spammy, 51.)
        Hammy tokens: 0.096-+--H*r:Postfix, 0.146-+--HContent-Transfer-Encoding:7bit, 0.152-+--H*c:plain
        Spammy tokens: 1.000-+--H*c:Windows-1251, 0.999-+--H*r:ip*205.158.62.177, 0.999-8--H*RU:205.158.62.177, 0.999-6--pussy, 0.999-5--babe, 0.998-5--fucked, 0.998-4--H*r:Wen, 0.995-2--chick
        Test scores: BAYES_99=3.5,DNS_FROM_RFC_POST=1.614,FAKE_OUTBLAZE_RCVD=3.1,HELO_DYNAMIC_IPADDR=4.4,INVALID_DATE=0.236,RATWARE_RCVD_PF=3.867,RCVD_IN_DSBL=3.805,RCVD_IN_NJABL_PROXY=0.438,RCVD_IN_SORBS_MISC=0.338,RCVD_IN_XBL=3.076,URIBL_OB_SURBL=3.213,URIBL_SC_SURBL=4.263,URIBL_WS_SURBL=1.462
Resent-To: spam@192.168.2.1


Can you confirm that these headers would cause the Bayes to learn
MY domain (alinoe.com) as spammy as shown in the previous post?
And if so, which headers should I remove before feeding it to 
the Bayes classifier?

-- 
Carlo Wood <ca...@alinoe.com>

Re: Problem with changing the Markup

Posted by Matt Kettler <mk...@evi-inc.com>.

Carlo Wood wrote:

>On Wed, May 04, 2005 at 12:03:15PM -0400, Matt Kettler wrote:
>  
>
>>If you're using qmail-scanner's fast_spamassassin option, that's your
>>problem.
>>    
>>
>
>Thank you!  I had a qmail-scanner 1.21 which didn't have
>a verbose_spamassassin option.  I got 1.25 and configured it
>with --scanners=verbose_spamassassin and now things work as
>expected.
>
>
>A problem with the filtering seems to be Bayesian classifier,
>I get:
>
>Spammy tokens: 0.970-+--H*Ad:D*alinoe.com, 0.965-+--HTo:D*alinoe.com, 0.960-+--H*Ad:U*carlo, 0.957-+--H*Ad:D*com, 0.953-+--HTo:U*carlo, 
>0.949-+--HTo:D*com, 0.899-+--st0ck, 0.891-+--H*RU:alinoe.com
>
>In -well- every mail.  That is not too weird, since
>this is my domain!  Why does rate 'alinoe.com' and 'com'
>and 'carlo' as spammy tokens?  Is that normal?
>
>  
>
No, it's not normal.

Have you been training your bayes using forwarded messages? 

In general it looks like your bayes has been very heavily trained on
spam that was addressed To: you, and almost no nonspam messages
addressed To: you. This is something that could happen if you were
forwarding mail for training, or if you used someone elses nonspam for
training (and little or none of your own), but did use your own spam.

Re: Problem with changing the Markup

Posted by Carlo Wood <ca...@alinoe.com>.

On Wed, May 04, 2005 at 12:03:15PM -0400, Matt Kettler wrote:
> If you're using qmail-scanner's fast_spamassassin option, that's your
> problem.

Thank you!  I had a qmail-scanner 1.21 which didn't have
a verbose_spamassassin option.  I got 1.25 and configured it
with --scanners=verbose_spamassassin and now things work as
expected.

A problem with the filtering seems to be Bayesian classifier,
I get:

Spammy tokens: 0.970-+--H*Ad:D*alinoe.com, 0.965-+--HTo:D*alinoe.com, 0.960-+--H*Ad:U*carlo, 0.957-+--H*Ad:D*com, 0.953-+--HTo:U*carlo, 
0.949-+--HTo:D*com, 0.899-+--st0ck, 0.891-+--H*RU:alinoe.com

In -well- every mail.  That is not too weird, since
this is my domain!  Why does rate 'alinoe.com' and 'com'
and 'carlo' as spammy tokens?  Is that normal?

-- 
Carlo Wood <ca...@alinoe.com>

Re: Problem with changing the Markup

Posted by Matt Kettler <mk...@evi-inc.com>.

Carlo Wood wrote:

>Hi, I am having some problems with false-positives and
>want to know why spamassassin is giving the rating it
>gives.
>
>In order to achieve that, I added the rule:
>
>add_header all Scores _TESTSSCORES(,)_
>
>To my config file.
>When that didn't work, I found out that many other
>things in that config file didn't work either; 
>
<snip>

>
>I am using qmail, which uses qmail-scanner, which uses
>/usr/bin/spamc (/usr/bin/qmail-scanner-queue.pl contains:
>  
>
<snip>

If you're using qmail-scanner's fast_spamassassin option, that's your
problem.

In this mode qmail-scanner does it's own markups so whatever add_header
settings you do in spamassassin are completely irrelevant.

You'll also need to ditch the -c option to spamc. If you pass -c, spamc
only returns the score, and does not mark the message or return the
hitlist. This is useful to fast_spamassassin, but defeats your desire of
getting a list of rules.