You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Dan Barker <db...@visioncomm.net> on 2007/02/08 14:17:42 UTC

Scoring strangely

I received a spam yesterday with two different scores (one directly to me,
one to a webmaster account that forwards to me).

This was very odd, because the scores were quite different. I understand
differences in the AWL and Bayes scores, due to being processed with
different user directories (actually, domain directories in this
implementation of 3.1.7). However, there are some other tests that are
coming out differently, depending on the user specified. All have identical
user_prefs (completely null) with the exception of some whitelist_from's.

Setup is  spamc/spamd, so the following analysis data were obtained by 4
calls to spamc with different parameters:

>spamc -p 785 -s 500000 -u bydanjohnson   < \tmp\Dtestnews.smd >
\tmp\testnewsB.smd
>spamc -p 785 -s 500000 -u kitepilot      < \tmp\Dtestnews.smd >
\tmp\testnewsK.smd
>spamc -p 785 -s 500000 -u LMFP           < \tmp\Dtestnews.smd >
\tmp\testnewsL.smd
>spamc -p 785 -s 500000 -u visioncomm.net < \tmp\Dtestnews.smd >
\tmp\testnewsV.smd

Where do I begin to look to understand what's happening? I understand the
lines with an asterisk in column 1.
 spamassassin --lint is fine, as is
 spamassassin -p <various user directories> --lint, if that is even supposed
to work<g>.

Summaries from the spamc output (- indicates the test did not fire. It's
included so non-proportional fonts and/or line wraps don't mess up the
table):

*-u                      B      K      L      V
*Score                   3.3    5.2    2.9    7.3
*AWL                     0.712  -      0.390  0.629
*BAYES_80                -      2      -      -
*BAYES_99                -      -      -      3.5
 FORGED_RCVD_HELO        -      0.135  -      0.135
 HTML_50_60              -      0.134  -      0.134
*HTML_MESSAGE            0.001  0.001  0.001  0.001
 HTML_TAG_EXIST_TBODY    0.126  -      0.126  -
 MIME_HTML_MOSTLY        0.699  1.102  0.699  1.102
 MPART_ALT_DIFF          0.137  -      0.137  -
 RCVD_IN_BL_SPAMCOP_NET  1.332  1.558  1.332  1.558
*URIBL_GREY              0.25   0.25   0.25   0.25

Notice that some of the rules use different scores for the same tests (Mime
mostly and Spamcop). That's got to be a hint to somebody<g>.

Thanks in Advance;

Dan Barker

Re: Scoring strangely

Posted by jdow <jd...@earthlink.net>.

From: "Dan Barker" <db...@visioncomm.net>

>I received a spam yesterday with two different scores (one directly to me,
> one to a webmaster account that forwards to me).
>
> This was very odd, because the scores were quite different. I understand
> differences in the AWL and Bayes scores, due to being processed with
> different user directories (actually, domain directories in this
> implementation of 3.1.7). However, there are some other tests that are
> coming out differently, depending on the user specified. All have 
> identical
> user_prefs (completely null) with the exception of some whitelist_from's.
>
> Setup is  spamc/spamd, so the following analysis data were obtained by 4
> calls to spamc with different parameters:
>
>>spamc -p 785 -s 500000 -u bydanjohnson   < \tmp\Dtestnews.smd >
> \tmp\testnewsB.smd
>>spamc -p 785 -s 500000 -u kitepilot      < \tmp\Dtestnews.smd >
> \tmp\testnewsK.smd
>>spamc -p 785 -s 500000 -u LMFP           < \tmp\Dtestnews.smd >
> \tmp\testnewsL.smd
>>spamc -p 785 -s 500000 -u visioncomm.net < \tmp\Dtestnews.smd >
> \tmp\testnewsV.smd
>
> Where do I begin to look to understand what's happening? I understand the
> lines with an asterisk in column 1.
> spamassassin --lint is fine, as is
> spamassassin -p <various user directories> --lint, if that is even 
> supposed
> to work<g>.

In general on a Red Hat or Red Hat derived distro that will not work.
You must log in as an entity that has access to the "various user
directories". (Running it as root might work and might not. I'd not
trust it myself.)

The general rule is to run spamassassin lint tests as the intended user.
{^_^}

Re: Scoring strangely

Posted by Theo Van Dinter <fe...@apache.org>.

On Thu, Feb 08, 2007 at 08:17:42AM -0500, Dan Barker wrote:
> This was very odd, because the scores were quite different. I understand
> differences in the AWL and Bayes scores, due to being processed with
> different user directories (actually, domain directories in this
> implementation of 3.1.7). However, there are some other tests that are
> coming out differently, depending on the user specified. All have identical
> user_prefs (completely null) with the exception of some whitelist_from's.

If the users don't have a usable Bayes DB, they will get a different scoreset.
See "perldoc Mail::SpamAssassin::Conf" and the "score" section. :)

-- 
Randomly Selected Tagline:
Dark Ages: Knight Time.

Re: Scoring strangely

Posted by Theo Van Dinter <fe...@apache.org>.

On Thu, Feb 08, 2007 at 09:08:37AM -0500, Dan Barker wrote:
> If the Bayes counts are too low for Bayes scoring, then some of the other
> tests don't work. I guess it's turning off some text collection (that it

Well, the scores are different, which may enable/disable other rules.

> Should it be considered a bug? (FORGED_RCVD_HELO only works if Bayes is
> trained, MPART_ALT_DIFF only works if Bayes isn't, etc.).

Not a bug.  In fact, it's a specific optimization that is put in --
if Bayes isn't available, you get a scoreset that's tuned to not expect
Bayes.  I'm pretty sure there's a wiki doc for it, but I'm travelling
on a bus right now, so can't go searching for it.

-- 
Randomly Selected Tagline:
"Which one is oral?"

 	--Ralph Wiggum
 	  The Principal and the Pauper (Episode 4F23)

RE: Scoring strangely

Posted by Dan Barker <db...@visioncomm.net>.

Never mind. I figured it out. I'm not sure I like it, but I figured it out.
If the Bayes counts are too low for Bayes scoring, then some of the other
tests don't work. I guess it's turning off some text collection (that it
thinks it won't need) that later rules have come to depend upon (because of
that text collection that Bayes usually does).

Can anyone confirm this?

Should it be considered a bug? (FORGED_RCVD_HELO only works if Bayes is
trained, MPART_ALT_DIFF only works if Bayes isn't, etc.).

Further, as Theo points out, the Net/non-Bayes is scoreset 1, while
net/Bayes is scoreset 3.

Dan Barker

-----Original Message-----
From: Dan Barker [mailto:dbarker@visioncomm.net]
Sent: Thursday, February 08, 2007 8:18 AM
To: users@spamassassin.apache.org
Subject: Scoring strangely


I received a spam yesterday with two different scores (one directly to me,
one to a webmaster account that forwards to me).

This was very odd, because the scores were quite different. I understand
differences in the AWL and Bayes scores, due to being processed with
different user directories (actually, domain directories in this
implementation of 3.1.7). However, there are some other tests that are
coming out differently, depending on the user specified. All have identical
user_prefs (completely null) with the exception of some whitelist_from's.

Setup is  spamc/spamd, so the following analysis data were obtained by 4
calls to spamc with different parameters:

>spamc -p 785 -s 500000 -u bydanjohnson   < \tmp\Dtestnews.smd >
\tmp\testnewsB.smd
>spamc -p 785 -s 500000 -u kitepilot      < \tmp\Dtestnews.smd >
\tmp\testnewsK.smd
>spamc -p 785 -s 500000 -u LMFP           < \tmp\Dtestnews.smd >
\tmp\testnewsL.smd
>spamc -p 785 -s 500000 -u visioncomm.net < \tmp\Dtestnews.smd >
\tmp\testnewsV.smd

Where do I begin to look to understand what's happening? I understand the
lines with an asterisk in column 1.
 spamassassin --lint is fine, as is
 spamassassin -p <various user directories> --lint, if that is even supposed
to work<g>.

Summaries from the spamc output (- indicates the test did not fire. It's
included so non-proportional fonts and/or line wraps don't mess up the
table):

*-u                      B      K      L      V
*Score                   3.3    5.2    2.9    7.3
*AWL                     0.712  -      0.390  0.629
*BAYES_80                -      2      -      -
*BAYES_99                -      -      -      3.5
 FORGED_RCVD_HELO        -      0.135  -      0.135
 HTML_50_60              -      0.134  -      0.134
*HTML_MESSAGE            0.001  0.001  0.001  0.001
 HTML_TAG_EXIST_TBODY    0.126  -      0.126  -
 MIME_HTML_MOSTLY        0.699  1.102  0.699  1.102
 MPART_ALT_DIFF          0.137  -      0.137  -
 RCVD_IN_BL_SPAMCOP_NET  1.332  1.558  1.332  1.558
*URIBL_GREY              0.25   0.25   0.25   0.25

Notice that some of the rules use different scores for the same tests (Mime
mostly and Spamcop). That's got to be a hint to somebody<g>.

Thanks in Advance;

Dan Barker