You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chad M Stewart <cm...@balius.com> on 2004/11/11 13:23:33 UTC

different scores - spamd vs spamassassin

Hi all,

I'm using SpamAssassin 3.0.1 (2004-10-22). SA is running on an OpenBSD  
3.5 i386 machine.  I'm starting it up using the following

/usr/local/bin/spamd -u spamd -a --allowed-ips=192.168.1.0/24  
--siteconfigpath=/etc/mail/spamassassin/ -d --listen-ip=192.168.1.4 -D

my local.cf file is below, the spamd user has a user_prefs file that is  
all comments.

lock_method                     flock
bayes_auto_learn_threshold_spam         3.2
use_auto_whitelist      0
required_hits           3.2
report_safe             1
use_bayes               1
bayes_path /home/spamd/
bayes_auto_learn        0
bayes_learn_to_journal  1
skip_rbl_checks         1
use_razor2              1
use_dcc                 1
use_pyzor               1
ok_languages            en
ok_locales              en



My MTA talks to spamd over the network, this is all working nicely.   
I'm running SA for the site, not just me.  This morning I'm looking  
into why messages that I think should be clearly spam are not getting  
tagged that way.  My user agent is catching them but SA is not. :(

Here is the oddity that I found.  If I put the raw message into a file  
and then run 'spamassassin -D < a' as the spamd user on the system  
where the spamd process is running I get

Message-Id: <11...@xs4all.nl>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on  
bia.amotken.com
X-Spam-Level: *****************
X-Spam-Status: Yes, score=18.0 required=3.2 tests=ALL_TRUSTED,DCC_CHECK,
          
DRUGS_ERECTILE,DRUGS_ERECTILE_OBFU,HTML_90_100,HTML_IMAGE_ONLY_04,
         HTML_MESSAGE,HTML_SHORT_CENTER,MANY_EXCLAMATIONS,MIME_HTML_ONLY,
          
SUBJECT_DRUG_GAP_C,SUBJECT_DRUG_GAP_L,URIBL_AB_SURBL,URIBL_OB_SURBL,
         URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
         version=3.0.1

Now this same message when asked via my MTA to the running spamd  
process got



Nov 11 06:39:44 bia spamd[19025]: logmsg: checking message  
<11...@xs4all.nl> for (unknown):1002.
Nov 11 06:39:44 bia spamd[19025]: checking message  
<11...@xs4all.nl> for (unknown):1002.
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 tie-ing to DB  
file R/O /home/spamd/_toks
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 tie-ing to DB  
file R/O /home/spamd/_seen
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: found bayes db version 3
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: Not available for  
scanning, only 82 spam(s) in Bayes DB < 200
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing db_toks
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing db_seen
Nov 11 06:39:44 bia spamd[19025]: debug: metadata:  
X-Spam-Relays-Trusted:
Nov 11 06:39:44 bia spamd[19025]: debug: metadata:  
X-Spam-Relays-Untrusted:
Nov 11 06:39:44 bia spamd[19025]: debug: ---- MIME PARSER START ----
Nov 11 06:39:44 bia spamd[19025]: debug: main message type:  
content-transfer-encoding8bit
Nov 11 06:39:44 bia spamd[19025]: debug: parsing normal part
Nov 11 06:39:44 bia spamd[19025]: debug: added part, type:  
content-transfer-encoding8bit
Nov 11 06:39:44 bia spamd[19025]: debug: ---- MIME PARSER END ----
Nov 11 06:39:44 bia spamd[19025]: debug: Message too short for language  
analysis
Nov 11 06:39:44 bia spamd[19025]: debug: URIDNSBL: domains to query:
Nov 11 06:39:44 bia spamd[19025]: debug: Running tests for priority: 0
Nov 11 06:39:44 bia spamd[19025]: debug: running header regexp tests;  
score so far=0
Nov 11 06:39:44 bia spamd[19025]: debug: SPF: message was delivered  
entirely via trusted relays, not required
Nov 11 06:39:44 bia spamd[19025]: debug: all '*From' addrs:  
dickinsonvl@xs4all.nl
Nov 11 06:39:44 bia spamd[19025]: debug: all '*To' addrs: cms@balius.com
Nov 11 06:39:44 bia spamd[19025]: debug: SPF: message was delivered  
entirely via trusted relays, not required
Nov 11 06:39:44 bia spamd[19025]: debug: running body-text per-line  
regexp tests; score so far=-2.82
Nov 11 06:39:44 bia spamd[19025]: debug: running uri tests; score so  
far=-2.82
Nov 11 06:39:44 bia spamd[19025]: debug: Razor2 is available
Nov 11 06:39:44 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:44 bia spamd[19025]: debug: Using results from Razor v2.61
Nov 11 06:39:44 bia spamd[19025]: debug: Found Razor2 part: part=0  
noresponse skipme=1
Nov 11 06:39:44 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:45 bia spamd[19025]: debug: Razor2 results: spam? 0   
highest cf score: 0
Nov 11 06:39:45 bia spamd[19025]: debug: running raw-body-text per-line  
regexp tests; score so far=-1.662
Nov 11 06:39:45 bia spamd[19025]: debug: running full-text regexp  
tests; score so far=-1.662
Nov 11 06:39:45 bia spamd[19025]: debug: Razor2 is available
Nov 11 06:39:45 bia spamd[19025]: debug: Pyzor is available:  
/usr/local/bin/pyzor
Nov 11 06:39:45 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:45 bia spamd[621]: debug: setuid: helper proc 621:  
ruid=1002 euid=1002
Nov 11 06:39:51 bia spamd[19025]: debug: Pyzor: got response:  
217.160.253.84:24441      TimeoutError:
Nov 11 06:39:51 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:51 bia spamd[19025]: debug: Pyzor: couldn't grok response  
"217.160.253.84:24441    TimeoutError: "
Nov 11 06:39:51 bia spamd[19025]: debug: DCCifd is not available: no  
r/w dccifd socket found.
Nov 11 06:39:51 bia spamd[19025]: debug: DCC is available:  
/usr/local/bin/dccproc
Nov 11 06:39:51 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:51 bia spamd[11502]: debug: setuid: helper proc 11502:  
ruid=1002 euid=1002
Nov 11 06:39:51 bia spamd[19025]: debug: DCC: got response:  
X-DCC-xmailer-Metrics: bia.amotken.com 1192; Body=1 Fuz1=many  
Fuz2=many\^M
Nov 11 06:39:51 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:51 bia spamd[19025]: debug: DCC: Listed! BODY: 1 of 999999  
FUZ1: 999999 of 999999 FUZ2: 999999 of 999999
Nov 11 06:39:51 bia spamd[19025]: debug: Running tests for priority: 500
Nov 11 06:39:51 bia spamd[19025]: debug: running meta tests; score so  
far=-0.289
Nov 11 06:39:51 bia spamd[19025]: debug: running header regexp tests;  
score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running body-text per-line  
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running uri tests; score so  
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running raw-body-text per-line  
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running full-text regexp  
tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: Running tests for priority:  
1000
Nov 11 06:39:51 bia spamd[19025]: debug: running meta tests; score so  
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running header regexp tests;  
score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 created  
/home/spamd/.spamassassin/auto-whitelist.mutex
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 trying to get lock  
on /home/spamd/.spamassassin/auto-whitelist with 30 timeout
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 link to  
/home/spamd/.spamassassin/auto-whitelist.mutex: link ok
Nov 11 06:39:51 bia spamd[19025]: debug: Tie-ing to DB file R/W in  
/home/spamd/.spamassassin/auto-whitelist
Nov 11 06:39:51 bia spamd[19025]: debug: auto-whitelist (db-based):  
dickinsonvl@xs4all.nl|ip=none scores 0/0
Nov 11 06:39:51 bia spamd[19025]: debug: AWL active, pre-score: 2.181,  
autolearn score: 2.181, mean: undef, IP: undef
Nov 11 06:39:51 bia spamd[19025]: debug: add_score: New count: 1, new  
totscore: 2.181
Nov 11 06:39:51 bia spamd[19025]: debug: DB addr list: untie-ing and  
unlocking.
Nov 11 06:39:51 bia spamd[19025]: debug: DB addr list: file locked,  
breaking lock.
Nov 11 06:39:51 bia spamd[19025]: debug: unlock: 19025 unlocked  
/home/spamd/.spamassassin/auto-whitelist.mutex
Nov 11 06:39:51 bia spamd[19025]: debug: Post AWL score: 2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running body-text per-line  
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running uri tests; score so  
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running raw-body-text per-line  
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running full-text regexp  
tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: is spam? score=2.181  
required=3.2
Nov 11 06:39:51 bia spamd[19025]: debug:  
tests=ALL_TRUSTED,DCC_CHECK,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RATWARE 
_ZERO_TZ
Nov 11 06:39:51 bia spamd[19025]: debug:  
subtests=__0_TZ_3,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CTYPE_HTML,__HAS_M 
SGID,__HAS_SUBJECT,__HAS_X_MAILER,__MIME_HTML,__MIME_VERSION,__MSGID_OK_ 
DIGITS,__RATWARE_0_TZ_DATE,__SANE_MSGID,__UNUSABLE_MSGID
Nov 11 06:39:51 bia spamd[19025]: logmsg: clean message (2.2/3.2) for  
(unknown):1002 in 8.1 seconds, 576 bytes.


I'm at a complete loss as to why the different scores?  Is there  
something I've done wrong here?


Something else is going wrong with my Bayes db learning as well.  I  
restarted spamd this morning.  By restart I mean I found the running  
process ID, sent it a kill -TERM and then started it again using the  
above string.  Before the restart I had 2K+ entries in the db.  After  
restarting I'm now seeing

$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0         82          0  non-token data: nspam
0.000          0        161          0  non-token data: nham


Again I'm at a loss as to why this might have happened.  I'd really  
like to hear from some experts as to what it is that is going wrong  
here or might be.

Thank you for your time,
Chad

Re: different scores - spamd vs spamassassin

Posted by Brook Humphrey <ba...@webmedic.net>.
On Thursday 11 November 2004 04:23 am, Chad M Stewart wrote:
> Something else is going wrong with my Bayes db learning as well.  I  
> restarted spamd this morning.  By restart I mean I found the running  
> process ID, sent it a kill -TERM and then started it again using the  
> above string.  Before the restart I had 2K+ entries in the db.  After  
> restarting I'm now seeing

On my system with a site wide bays using spamd I had some issues with bayes 
learning at a few times. There was a deamon running on the system that would 
change permissions if it found things that were not secure and it decided 
that the 666 permissions on the bayes database was just not right. I've seen 
other reports of bayes not learning correctly and cant say for sure this is 
the issue but you might want to check your permissions on both the bayes and 
on the whitelist if you use that also.

>
> $ sa-learn --dump magic
> 0.000          0          3          0  non-token data: bayes db version
> 0.000          0         82          0  non-token data: nspam
> 0.000          0        161          0  non-token data: nham
>
>
> Again I'm at a loss as to why this might have happened.  I'd really  
> like to hear from some experts as to what it is that is going wrong  
> here or might be.
>
> Thank you for your time,
> Chad

-- 
 -~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-
                                      Brook Humphrey           
        Mobile PC Medic, 420 1st, Cheney, WA 99004, 509-235-9107        
http://www.webmedic.net, bah@webmedic.net, bah@linux-mandrake.com   
                                 Holiness unto the Lord
 -~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-

Re: different scores - spamd vs spamassassin

Posted by Matt Kettler <mk...@comcast.net>.
At 07:23 AM 11/11/2004 -0500, Chad M Stewart wrote:
>Hi all,
>
>I'm using SpamAssassin 3.0.1 (2004-10-22). SA is running on an OpenBSD
>3.5 i386 machine.  I'm starting it up using the following
>
>/usr/local/bin/spamd -u spamd -a --allowed-ips=192.168.1.0/24
>--siteconfigpath=/etc/mail/spamassassin/ -d --listen-ip=192.168.1.4 -D
>
>my local.cf file is below, the spamd user has a user_prefs file that is
>all comments.
>

<SNIP>

>skip_rbl_checks         1

<snip>



>  If I put the raw message into a file
>and then run 'spamassassin -D < a' as the spamd user on the system
>where the spamd process is running I get

<snip>

>X-Spam-Status: Yes, score=18.0 required=3.2 tests=ALL_TRUSTED,DCC_CHECK,
>
>DRUGS_ERECTILE,DRUGS_ERECTILE_OBFU,HTML_90_100,HTML_IMAGE_ONLY_04,
>         HTML_MESSAGE,HTML_SHORT_CENTER,MANY_EXCLAMATIONS,MIME_HTML_ONLY,
>
>SUBJECT_DRUG_GAP_C,SUBJECT_DRUG_GAP_L,URIBL_AB_SURBL,URIBL_OB_SURBL,
>         URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
>         version=3.0.1
>
>Now this same message when asked via my MTA to the running spamd
>process got

<snip>

>Nov 11 06:39:51 bia spamd[19025]: debug: is spam? score=2.181
>required=3.2
>Nov 11 06:39:51 bia spamd[19025]: debug:
>tests=ALL_TRUSTED,DCC_CHECK,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RATWARE 
>_ZERO_TZ

<snip>


>I'm at a complete loss as to why the different scores?  Is there
>something I've done wrong here?

First a comment: You seem to have a major problem with your 
trusted_networks setting. ALL_TRUSTED should not fire for spam.  If you've 
got a NATed mailserver, you need to manually set trusted_networks to only 
contain your servers, otherwise SA will assume the outside relay is a part 
of your network.

As for the different scores that's tough to account for. Are you sure those 
are the same messages? spamassassin is seeing a very different message 
body, and subject line than spamd. It looks very much like some HTML 
normalizer ripped out a bunch of stuff SA was matching on.

You can even see in the debug that despite spamassassin finding several 
URIBL list matches, that spamd finds no URLs at all, and thus
         Nov 11 06:39:44 bia spamd[19025]: debug: URIDNSBL: domains to query:


However, the subject-line change is almost completely inexplicable. 
SUBJECT_DRUG_GAP_C matched for spamassassin, but not for spamd.

Does this message have two subject lines?

Does some other tool in you mta chain mangle messages?