You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chad M Stewart <cm...@balius.com> on 2004/11/11 13:23:33 UTC
different scores - spamd vs spamassassin
Hi all,
I'm using SpamAssassin 3.0.1 (2004-10-22). SA is running on an OpenBSD
3.5 i386 machine. I'm starting it up using the following
/usr/local/bin/spamd -u spamd -a --allowed-ips=192.168.1.0/24
--siteconfigpath=/etc/mail/spamassassin/ -d --listen-ip=192.168.1.4 -D
my local.cf file is below, the spamd user has a user_prefs file that is
all comments.
lock_method flock
bayes_auto_learn_threshold_spam 3.2
use_auto_whitelist 0
required_hits 3.2
report_safe 1
use_bayes 1
bayes_path /home/spamd/
bayes_auto_learn 0
bayes_learn_to_journal 1
skip_rbl_checks 1
use_razor2 1
use_dcc 1
use_pyzor 1
ok_languages en
ok_locales en
My MTA talks to spamd over the network, this is all working nicely.
I'm running SA for the site, not just me. This morning I'm looking
into why messages that I think should be clearly spam are not getting
tagged that way. My user agent is catching them but SA is not. :(
Here is the oddity that I found. If I put the raw message into a file
and then run 'spamassassin -D < a' as the spamd user on the system
where the spamd process is running I get
Message-Id: <11...@xs4all.nl>
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on
bia.amotken.com
X-Spam-Level: *****************
X-Spam-Status: Yes, score=18.0 required=3.2 tests=ALL_TRUSTED,DCC_CHECK,
DRUGS_ERECTILE,DRUGS_ERECTILE_OBFU,HTML_90_100,HTML_IMAGE_ONLY_04,
HTML_MESSAGE,HTML_SHORT_CENTER,MANY_EXCLAMATIONS,MIME_HTML_ONLY,
SUBJECT_DRUG_GAP_C,SUBJECT_DRUG_GAP_L,URIBL_AB_SURBL,URIBL_OB_SURBL,
URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
version=3.0.1
Now this same message when asked via my MTA to the running spamd
process got
Nov 11 06:39:44 bia spamd[19025]: logmsg: checking message
<11...@xs4all.nl> for (unknown):1002.
Nov 11 06:39:44 bia spamd[19025]: checking message
<11...@xs4all.nl> for (unknown):1002.
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 tie-ing to DB
file R/O /home/spamd/_toks
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 tie-ing to DB
file R/O /home/spamd/_seen
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: found bayes db version 3
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: Not available for
scanning, only 82 spam(s) in Bayes DB < 200
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing db_toks
Nov 11 06:39:44 bia spamd[19025]: debug: bayes: 19025 untie-ing db_seen
Nov 11 06:39:44 bia spamd[19025]: debug: metadata:
X-Spam-Relays-Trusted:
Nov 11 06:39:44 bia spamd[19025]: debug: metadata:
X-Spam-Relays-Untrusted:
Nov 11 06:39:44 bia spamd[19025]: debug: ---- MIME PARSER START ----
Nov 11 06:39:44 bia spamd[19025]: debug: main message type:
content-transfer-encoding8bit
Nov 11 06:39:44 bia spamd[19025]: debug: parsing normal part
Nov 11 06:39:44 bia spamd[19025]: debug: added part, type:
content-transfer-encoding8bit
Nov 11 06:39:44 bia spamd[19025]: debug: ---- MIME PARSER END ----
Nov 11 06:39:44 bia spamd[19025]: debug: Message too short for language
analysis
Nov 11 06:39:44 bia spamd[19025]: debug: URIDNSBL: domains to query:
Nov 11 06:39:44 bia spamd[19025]: debug: Running tests for priority: 0
Nov 11 06:39:44 bia spamd[19025]: debug: running header regexp tests;
score so far=0
Nov 11 06:39:44 bia spamd[19025]: debug: SPF: message was delivered
entirely via trusted relays, not required
Nov 11 06:39:44 bia spamd[19025]: debug: all '*From' addrs:
dickinsonvl@xs4all.nl
Nov 11 06:39:44 bia spamd[19025]: debug: all '*To' addrs: cms@balius.com
Nov 11 06:39:44 bia spamd[19025]: debug: SPF: message was delivered
entirely via trusted relays, not required
Nov 11 06:39:44 bia spamd[19025]: debug: running body-text per-line
regexp tests; score so far=-2.82
Nov 11 06:39:44 bia spamd[19025]: debug: running uri tests; score so
far=-2.82
Nov 11 06:39:44 bia spamd[19025]: debug: Razor2 is available
Nov 11 06:39:44 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:44 bia spamd[19025]: debug: Using results from Razor v2.61
Nov 11 06:39:44 bia spamd[19025]: debug: Found Razor2 part: part=0
noresponse skipme=1
Nov 11 06:39:44 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:45 bia spamd[19025]: debug: Razor2 results: spam? 0
highest cf score: 0
Nov 11 06:39:45 bia spamd[19025]: debug: running raw-body-text per-line
regexp tests; score so far=-1.662
Nov 11 06:39:45 bia spamd[19025]: debug: running full-text regexp
tests; score so far=-1.662
Nov 11 06:39:45 bia spamd[19025]: debug: Razor2 is available
Nov 11 06:39:45 bia spamd[19025]: debug: Pyzor is available:
/usr/local/bin/pyzor
Nov 11 06:39:45 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:45 bia spamd[621]: debug: setuid: helper proc 621:
ruid=1002 euid=1002
Nov 11 06:39:51 bia spamd[19025]: debug: Pyzor: got response:
217.160.253.84:24441 TimeoutError:
Nov 11 06:39:51 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:51 bia spamd[19025]: debug: Pyzor: couldn't grok response
"217.160.253.84:24441 TimeoutError: "
Nov 11 06:39:51 bia spamd[19025]: debug: DCCifd is not available: no
r/w dccifd socket found.
Nov 11 06:39:51 bia spamd[19025]: debug: DCC is available:
/usr/local/bin/dccproc
Nov 11 06:39:51 bia spamd[19025]: debug: entering helper-app run mode
Nov 11 06:39:51 bia spamd[11502]: debug: setuid: helper proc 11502:
ruid=1002 euid=1002
Nov 11 06:39:51 bia spamd[19025]: debug: DCC: got response:
X-DCC-xmailer-Metrics: bia.amotken.com 1192; Body=1 Fuz1=many
Fuz2=many\^M
Nov 11 06:39:51 bia spamd[19025]: debug: leaving helper-app run mode
Nov 11 06:39:51 bia spamd[19025]: debug: DCC: Listed! BODY: 1 of 999999
FUZ1: 999999 of 999999 FUZ2: 999999 of 999999
Nov 11 06:39:51 bia spamd[19025]: debug: Running tests for priority: 500
Nov 11 06:39:51 bia spamd[19025]: debug: running meta tests; score so
far=-0.289
Nov 11 06:39:51 bia spamd[19025]: debug: running header regexp tests;
score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running body-text per-line
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running uri tests; score so
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running raw-body-text per-line
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running full-text regexp
tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: Running tests for priority:
1000
Nov 11 06:39:51 bia spamd[19025]: debug: running meta tests; score so
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running header regexp tests;
score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 created
/home/spamd/.spamassassin/auto-whitelist.mutex
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 trying to get lock
on /home/spamd/.spamassassin/auto-whitelist with 30 timeout
Nov 11 06:39:51 bia spamd[19025]: debug: lock: 19025 link to
/home/spamd/.spamassassin/auto-whitelist.mutex: link ok
Nov 11 06:39:51 bia spamd[19025]: debug: Tie-ing to DB file R/W in
/home/spamd/.spamassassin/auto-whitelist
Nov 11 06:39:51 bia spamd[19025]: debug: auto-whitelist (db-based):
dickinsonvl@xs4all.nl|ip=none scores 0/0
Nov 11 06:39:51 bia spamd[19025]: debug: AWL active, pre-score: 2.181,
autolearn score: 2.181, mean: undef, IP: undef
Nov 11 06:39:51 bia spamd[19025]: debug: add_score: New count: 1, new
totscore: 2.181
Nov 11 06:39:51 bia spamd[19025]: debug: DB addr list: untie-ing and
unlocking.
Nov 11 06:39:51 bia spamd[19025]: debug: DB addr list: file locked,
breaking lock.
Nov 11 06:39:51 bia spamd[19025]: debug: unlock: 19025 unlocked
/home/spamd/.spamassassin/auto-whitelist.mutex
Nov 11 06:39:51 bia spamd[19025]: debug: Post AWL score: 2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running body-text per-line
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running uri tests; score so
far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running raw-body-text per-line
regexp tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: running full-text regexp
tests; score so far=2.181
Nov 11 06:39:51 bia spamd[19025]: debug: is spam? score=2.181
required=3.2
Nov 11 06:39:51 bia spamd[19025]: debug:
tests=ALL_TRUSTED,DCC_CHECK,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RATWARE
_ZERO_TZ
Nov 11 06:39:51 bia spamd[19025]: debug:
subtests=__0_TZ_3,__CT,__CTE,__CTYPE_CHARSET_QUOTED,__CTYPE_HTML,__HAS_M
SGID,__HAS_SUBJECT,__HAS_X_MAILER,__MIME_HTML,__MIME_VERSION,__MSGID_OK_
DIGITS,__RATWARE_0_TZ_DATE,__SANE_MSGID,__UNUSABLE_MSGID
Nov 11 06:39:51 bia spamd[19025]: logmsg: clean message (2.2/3.2) for
(unknown):1002 in 8.1 seconds, 576 bytes.
I'm at a complete loss as to why the different scores? Is there
something I've done wrong here?
Something else is going wrong with my Bayes db learning as well. I
restarted spamd this morning. By restart I mean I found the running
process ID, sent it a kill -TERM and then started it again using the
above string. Before the restart I had 2K+ entries in the db. After
restarting I'm now seeing
$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 82 0 non-token data: nspam
0.000 0 161 0 non-token data: nham
Again I'm at a loss as to why this might have happened. I'd really
like to hear from some experts as to what it is that is going wrong
here or might be.
Thank you for your time,
Chad
Re: different scores - spamd vs spamassassin
Posted by Brook Humphrey <ba...@webmedic.net>.
On Thursday 11 November 2004 04:23 am, Chad M Stewart wrote:
> Something else is going wrong with my Bayes db learning as well. I
> restarted spamd this morning. By restart I mean I found the running
> process ID, sent it a kill -TERM and then started it again using the
> above string. Before the restart I had 2K+ entries in the db. After
> restarting I'm now seeing
On my system with a site wide bays using spamd I had some issues with bayes
learning at a few times. There was a deamon running on the system that would
change permissions if it found things that were not secure and it decided
that the 666 permissions on the bayes database was just not right. I've seen
other reports of bayes not learning correctly and cant say for sure this is
the issue but you might want to check your permissions on both the bayes and
on the whitelist if you use that also.
>
> $ sa-learn --dump magic
> 0.000 0 3 0 non-token data: bayes db version
> 0.000 0 82 0 non-token data: nspam
> 0.000 0 161 0 non-token data: nham
>
>
> Again I'm at a loss as to why this might have happened. I'd really
> like to hear from some experts as to what it is that is going wrong
> here or might be.
>
> Thank you for your time,
> Chad
--
-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-
Brook Humphrey
Mobile PC Medic, 420 1st, Cheney, WA 99004, 509-235-9107
http://www.webmedic.net, bah@webmedic.net, bah@linux-mandrake.com
Holiness unto the Lord
-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-~`'~-
Re: different scores - spamd vs spamassassin
Posted by Matt Kettler <mk...@comcast.net>.
At 07:23 AM 11/11/2004 -0500, Chad M Stewart wrote:
>Hi all,
>
>I'm using SpamAssassin 3.0.1 (2004-10-22). SA is running on an OpenBSD
>3.5 i386 machine. I'm starting it up using the following
>
>/usr/local/bin/spamd -u spamd -a --allowed-ips=192.168.1.0/24
>--siteconfigpath=/etc/mail/spamassassin/ -d --listen-ip=192.168.1.4 -D
>
>my local.cf file is below, the spamd user has a user_prefs file that is
>all comments.
>
<SNIP>
>skip_rbl_checks 1
<snip>
> If I put the raw message into a file
>and then run 'spamassassin -D < a' as the spamd user on the system
>where the spamd process is running I get
<snip>
>X-Spam-Status: Yes, score=18.0 required=3.2 tests=ALL_TRUSTED,DCC_CHECK,
>
>DRUGS_ERECTILE,DRUGS_ERECTILE_OBFU,HTML_90_100,HTML_IMAGE_ONLY_04,
> HTML_MESSAGE,HTML_SHORT_CENTER,MANY_EXCLAMATIONS,MIME_HTML_ONLY,
>
>SUBJECT_DRUG_GAP_C,SUBJECT_DRUG_GAP_L,URIBL_AB_SURBL,URIBL_OB_SURBL,
> URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=disabled
> version=3.0.1
>
>Now this same message when asked via my MTA to the running spamd
>process got
<snip>
>Nov 11 06:39:51 bia spamd[19025]: debug: is spam? score=2.181
>required=3.2
>Nov 11 06:39:51 bia spamd[19025]: debug:
>tests=ALL_TRUSTED,DCC_CHECK,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,RATWARE
>_ZERO_TZ
<snip>
>I'm at a complete loss as to why the different scores? Is there
>something I've done wrong here?
First a comment: You seem to have a major problem with your
trusted_networks setting. ALL_TRUSTED should not fire for spam. If you've
got a NATed mailserver, you need to manually set trusted_networks to only
contain your servers, otherwise SA will assume the outside relay is a part
of your network.
As for the different scores that's tough to account for. Are you sure those
are the same messages? spamassassin is seeing a very different message
body, and subject line than spamd. It looks very much like some HTML
normalizer ripped out a bunch of stuff SA was matching on.
You can even see in the debug that despite spamassassin finding several
URIBL list matches, that spamd finds no URLs at all, and thus
Nov 11 06:39:44 bia spamd[19025]: debug: URIDNSBL: domains to query:
However, the subject-line change is almost completely inexplicable.
SUBJECT_DRUG_GAP_C matched for spamassassin, but not for spamd.
Does this message have two subject lines?
Does some other tool in you mta chain mangle messages?