You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeff Aitken <ja...@aitken.com> on 2008/05/15 16:19:13 UTC

inconsistent scoring issue?

Hello,

Apologies if this is a FAQ or old news, but I did a bit of searching
yesterday and didn't find an answer to this one.

I'm using SA (3.2.4) site-wide on a FreeBSD-6.3 box in conjunction with
postfix, using procmail as the LDA.  I'm using spamd/spamc, so the individual
spamc processes are run as the recipient's userid (since they're spawned
by procmail).  I know this has implications for which bayes db gets
consulted (versus a true "sitewide" with shared bayes db) but I don't
think that's the issue I'm seeing here.  Anyway...

It seems like a lot more spam has been getting through in the last couple
of weeks.  This prompted me to enable Pyzor, which I had not done in my
initial install.  While that seems to work, I noticed that I'm getting
inconsistent scoring results on messages that should be tagged as spam but
which are not.

For example, a message that was just delivered to my inbox contained the
following report from SA:

    X-Spam-Status: No, score=4.4 required=5.0 tests=BAYES_99,DATE_IN_FUTURE_03_06,
        RAZOR2_CHECK,RDNS_DYNAMIC autolearn=no version=3.2.4
    X-Spam-Report:
            *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
            *      [score: 1.0000]
            *  0.3 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date
            *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
            *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
            *      dynamic-looking rDNS

If I save the original message and run SA manually (spamassassin -t < msg)
I get the following:

    X-Spam-Status: Yes, score=7.3 required=5.0 tests=AWL,BAYES_99, 
            DATE_IN_FUTURE_03_06,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,
	    RAZOR2_CHECK,RCVD_IN_DSBL,RCVD_IN_SORBS_DUL,RDNS_DYNAMIC,URIBL_BLACK 
            autolearn=no version=3.2.4
    X-Spam-Report:
            *  0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
            *      [88.73.238.103 listed in dnsbl.sorbs.net]
            *  1.0 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
            *      [<http://dsbl.org/listing?88.73.238.103>]
            *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
            *      [score: 1.0000]
            *  0.3 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date
            *  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
            *      above 50%
            *      [cf: 100]
            *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
            *  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
            *      [cf: 100]
            *  2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
            *      [URIs: win-todayoo.com.cn]
            *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
            *      dynamic-looking rDNS
            * -2.9 AWL AWL: From: address is in the auto white-list

I'm going to assume that the score being wrong by 0.1 (should be 7.4, not
7.3) is due to a rounding error or other similar issue.  However, I can't
figure out why the results are so different.  What's even more interesting
is that if I turn on debugging (spamassassin -D -t < msg) then I get a
*third* different result:

    X-Spam-Status: Yes, score=8.7 required=5.0 tests=AWL,BAYES_99,
            DATE_IN_FUTURE_03_06,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E8_51_100,
            RAZOR2_CHECK,RCVD_IN_DSBL,RCVD_IN_SORBS_DUL,RDNS_DYNAMIC,URIBL_BLACK
            autolearn=no version=3.2.4
    X-Spam-Report:
            *  0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
            *      [88.73.238.103 listed in dnsbl.sorbs.net]
            *  1.0 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
            *      [<http://dsbl.org/listing?88.73.238.103>]
            *  2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
            *      [URIs: win-todayoo.com.cn]
            *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
            *      [score: 1.0000]
            *  0.3 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date
            *  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
            *      above 50%
            *      [cf: 100]
            *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
            *  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
            *      [cf: 100]
            *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
            *      dynamic-looking rDNS
            * -1.4 AWL AWL: From: address is in the auto white-list

The two commands were run on the same host, by the same user, within
seconds of one another, and yet the scores for the AWL test are 1.5
different.

Any thoughts on what I'm missing or doing wrong?

Thanks!


--Jeff


Re: inconsistent scoring issue?

Posted by John Hardin <jh...@impsec.org>.
On Fri, 16 May 2008, Jeff Aitken wrote:

> I'm thinking you're probably right that this is a timing issue.  I just 
> checked another message that had different scoring results.  The initial 
> message was received on 5/15 at 1156UTC and did not hit URIBL_BLACK.  I 
> fed it to SA manually at 1203UTC and it DID hit URIBL_BLACK.  I looked 
> up the URI in question and it was listed on 5/15 at 1153UTC.

One argument for implementing greylisting?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   You do not examine legislation in the light of the benefits it
   will convey if properly administered, but in the light of the
   wrongs it would do and the harms it would cause if improperly
   administered.                                  -- Lyndon B. Johnson
-----------------------------------------------------------------------
  5 days until the 4th anniversary of SpaceshipOne winning the X-prize

Re: inconsistent scoring issue?

Posted by Jeff Aitken <ja...@aitken.com>.
On Thu, May 15, 2008 at 08:53:57PM +0200, Karsten Br?ckelmann wrote:
> Yes. Hence my question about mail hitting URIBL_BLACK on the first run,
> unlike that one example.
> 
> The point is, whether *no* mail hits URIBL_BLACK, or at least *some*
> mail does. Do you get any URIBL_BLACK hits at all? Is that one example
> you pasted exemplary for all your incoming mail, never hitting
> URIBL_BLACK -- or is this an isolated case not triggering the BL?
> 
> The answer to this might hint where to look next...

At, gotcha.  Yes, some messages do hit URIBL_BLACK; all examples that I've
found so far are also (properly) identified as spam.

I'm thinking you're probably right that this is a timing issue.  I just
checked another message that had different scoring results.  The initial
message was received on 5/15 at 1156UTC and did not hit URIBL_BLACK.  I
fed it to SA manually at 1203UTC and it DID hit URIBL_BLACK.  I looked up
the URI in question and it was listed on 5/15 at 1153UTC.


--Jeff


Re: inconsistent scoring issue?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2008-05-15 at 16:20 +0000, Jeff Aitken wrote:
> On Thu, May 15, 2008 at 05:35:52PM +0200, Karsten Br?ckelmann wrote:

> > Do you see hits URIBL_BLACK hits in the incoming stream at all?
> 
> Not sure exactly what you're asking here... but I included the entire
> X-Spam-Status and X-Spam-Report headers, without removing any lines.  So
> there was no URIBL_BLACK hit in the message as it was delivered to my
> inbox, but the same message, when run through SA manually a few minutes
> later, did trigger it.

Yes. Hence my question about mail hitting URIBL_BLACK on the first run,
unlike that one example.

The point is, whether *no* mail hits URIBL_BLACK, or at least *some*
mail does. Do you get any URIBL_BLACK hits at all? Is that one example
you pasted exemplary for all your incoming mail, never hitting
URIBL_BLACK -- or is this an isolated case not triggering the BL?

The answer to this might hint where to look next...

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: inconsistent scoring issue?

Posted by Jeff Aitken <ja...@aitken.com>.
On Thu, May 15, 2008 at 05:35:52PM +0200, Karsten Br?ckelmann wrote:
> No DNSBLs in the original result... This *may* be due to the BLs
> catching up, and the second run being done later. This specifically
> seems to be the case for Razor (which hit in both run, just differently)
> and likely for URIBL_BLACK, too. Maybe DNS timeout issues.

Perhaps... don't see any evidence in the logs, but I might not without
lots of extra debugging enabled.  Whatever is happening, it's a definite
change from as recently as two weeks ago because other users on this 
system have reported a massive increase in spam not being properly
classified as such.

However, wrt to the comment about the BLs catching up, if I'm reading it
right this host has been listed in at least DSBL since last year.  Still
could have been a timeout on my end, of course, but it seems unlikely that
I've been having timeouts for the last two weeks or so.


> Do you see hits URIBL_BLACK hits in the incoming stream at all?

Not sure exactly what you're asking here... but I included the entire
X-Spam-Status and X-Spam-Report headers, without removing any lines.  So
there was no URIBL_BLACK hit in the message as it was delivered to my
inbox, but the same message, when run through SA manually a few minutes
later, did trigger it.


> AWL is a score averager. It *will* change by run, unless the difference
> between the current score and the previous average is about 0.

Ah, right... didn't think about the affect of running SA manually
influencing the AWL on subsequent runs.  The initial email scored 4.4, but
when I ran SA manually it scored 10.3 or so, which means the AWL should
have subtracted about 3 on the second exposure to that sender... and then
down from there, etc.  Sorry, my fault for not thinking that one through.


--Jeff


Re: inconsistent scoring issue?

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Thu, 2008-05-15 at 14:19 +0000, Jeff Aitken wrote:

> For example, a message that was just delivered to my inbox contained the
> following report from SA:
> 
>     X-Spam-Report:
>             *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>             *      [score: 1.0000]
>             *  0.3 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date
>             *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
>             *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
>             *      dynamic-looking rDNS
> 
> If I save the original message and run SA manually (spamassassin -t < msg)
> I get the following:
> 
>     X-Spam-Report:
>             *  0.9 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
>             *      [88.73.238.103 listed in dnsbl.sorbs.net]
>             *  1.0 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
>             *      [<http://dsbl.org/listing?88.73.238.103>]
>             *  3.5 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
>             *      [score: 1.0000]
>             *  0.3 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date
>             *  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
>             *      above 50%
>             *      [cf: 100]
>             *  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
>             *  0.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
>             *      [cf: 100]
>             *  2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist
>             *      [URIs: win-todayoo.com.cn]
>             *  0.1 RDNS_DYNAMIC Delivered to trusted network by host with
>             *      dynamic-looking rDNS
>             * -2.9 AWL AWL: From: address is in the auto white-list

No DNSBLs in the original result... This *may* be due to the BLs
catching up, and the second run being done later. This specifically
seems to be the case for Razor (which hit in both run, just differently)
and likely for URIBL_BLACK, too. Maybe DNS timeout issues.

Do you see hits URIBL_BLACK hits in the incoming stream at all?


> I'm going to assume that the score being wrong by 0.1 (should be 7.4, not
> 7.3) is due to a rounding error or other similar issue.  However, I can't
> figure out why the results are so different.  What's even more interesting
> is that if I turn on debugging (spamassassin -D -t < msg) then I get a
> *third* different result:
[...]
>             * -1.4 AWL AWL: From: address is in the auto white-list
> 
> The two commands were run on the same host, by the same user, within
> seconds of one another, and yet the scores for the AWL test are 1.5
> different.

AWL is a score averager. It *will* change by run, unless the difference
between the current score and the previous average is about 0.

Please see these:
  http://wiki.apache.org/spamassassin/AutoWhitelist
  http://wiki.apache.org/spamassassin/AwlWrongWay

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}