You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Greg Earle <ea...@isolar.DynDNS.ORG> on 2005/06/22 20:24:44 UTC

Why do I get different scores from "spamd" than manually?

I keep getting these Via*/Cial*/Val* "and many other" SPAMs (you know 
the ones,
they start with "Hello, Welcome to <link to their shop>" and have all 
those
obfuscating "DISPLAY:" "none"s embedded in them).

(I'm still using 2.63 on my production mail server, btw.  Please don't 
shoot
me.)

What I don't understand is why I get this behavior: in the message as I
received it, it scored as

X-Spam-Status: 	No, hits=2.8 required=5.0 tests=BAYES_10,HTML_50_60,
		HTML_FONT_BIG,HTML_MESSAGE,PRIORITY_NO_NAME,SPAMCOP_URI_RBL_JP
		autolearn=no version=2.63

However, if I take the message and extract it into a file and feed it
to "spamassassin -D < spam", I get

X-Spam-Status: Yes, hits=14.8 required=5.0 tests=BAYES_30,HTML_50_60,
         
HTML_FONT_BIG,HTML_MESSAGE,MSGID_FROM_MTA_SHORT,PRIORITY_NO_NAME,
         RCVD_IN_BL_SPAMCOP_NET,SPAMCOP_URI_RBL_AB,SPAMCOP_URI_RBL_JP,
         SPAMCOP_URI_RBL_OB,SPAMCOP_URI_RBL_SC,SPAMCOP_URI_RBL_WS
         autolearn=spam version=2.63

Why do I only get one SPAMCOP_URI_RBL_* hit when it's fed to "spamd"
as it comes in, yet I get 5 of them when I run it manually?  Why is
"autolearn=no" set when "spamd" gets it, but "autolearn=spam" is set
when "spamassassin" gets it?

"spamd" is running (on NetBSD 1.6.1) as

/usr/pkg/bin/spamd -H -c -a -d -r /var/run/spamd.pid

And, last but not least, these things all contain those obfuscating
lines with

	<DIV><FONT face=3DArial>Have a nice d<SPAN style=3D"DISPLAY: none"> =
	pestilential </SPAN>ay!</FONT></DIV></DIV></BODY></HTML>

embedded in them.  I've tried every rule I can think of to catch the
"DISPLAY: none" stuff, yet it never matches.  I took a rule someone
posted here and pared it down to nothing more than

body    SENET_DISPNONE  /DISPLAY: none/
describe        SENET_DISPNONE  Hidden text via CSS attributes
score   SENET_DISPNONE  7.5

and it *still* doesn't match it.  Why not?  Do "body" matches not work
on HTML in 2.63?

(Edit: my own server rejected my sending this as-is; it matched the
"SENET_DISPNONE" rule on the above text!  Thus proving that it finds
  "Display:" and "none" just fine when it's in the body as Plain Text ...
  so why doesn't it find them when they're inside HTML?)

Thanks,

	- Greg


Re: Why do I get different scores from "spamd" than manually?

Posted by Matt Kettler <mk...@evi-inc.com>.
Greg Earle wrote:
 > (I'm still using 2.63 on my production mail server, btw.  Please don't
> shoot
> me.)

I'll avoid shooting you, but I will warn you that you have a DoS vulnerability.

2.64 and higher are immune to this particular DoS.

3.0.1-3.0.3 are also subject to a separate DoS that's fixed in 3.0.4. 3.0.0 and
earlier (including 2.64) aren't subject to this one.

Upgrade to 3.0.4 if you can, or at least upgrade to 2.64.

Doing 2.64 should be easy, the requirements for perl modules and config options
are all the same. The only hangup doing 2.64 is you'll need remove the
spamcop_uri.cf to re-install Mail::SpamcopURI after the upgrade (SpamcopURI is
technically a patch and gets overwritten in the upgrade. Also if the .cf file
remains, make test will blow up as baseline 2.64 doesn't understand those commands)

3.0.4 has some config option and requirement changes, but would be very
worthwhile if you can spare the effort.


> 
> Why do I only get one SPAMCOP_URI_RBL_* hit when it's fed to "spamd"
> as it comes in, yet I get 5 of them when I run it manually?

Time. Unless both tests happen simultaneously, or within a very short time of
each other, it's easy for the URIBL to "catch up" and add more listings.

Compare running the message through spamc and spamassassin one right after the
other. You should get the same uribl hits.
	$spamassassin < spam
	$spamc < spam


  Why is
> "autolearn=no" set when "spamd" gets it, but "autolearn=spam" is set
> when "spamassassin" gets it?

The first factor is score differences, as above.

The second factor is 99.99% of the time with results like this you are using
different users for the test. Spamassassin stores it's bayes database in the
user's home directory by default. However, this is the user EXECUTING
spamassassin. It is not necessarily the recipient of the message.

Very often spamc gets called as root when mail arrives, and spamd will scan the
mail as "nobody" to avoid scanning mail as root. It will pick a bayes DB and
user_prefs out of nobody's home directory.

However, when you run spamassassin, it will use the current user. Even if it is
root.

> Do "body" matches not work on HTML in 2.63? 

As someone else pointed out, html tags are stripped from the body before "body"
rules run. Use rawbody instead.

see the manpage entry for "body" and "rawbody" in man Mail::SpamAssassin::Conf
for details on what things are done to the message to create the "body" and
"rawbody" text.





Re: Why do I get different scores from "spamd" than manually?

Posted by Kai Schaetzl <ma...@conactive.com>.
Greg Earle wrote on Wed, 22 Jun 2005 11:24:44 -0700:

> Why do I only get one SPAMCOP_URI_RBL_* hit when it's fed to "spamd" 
> as it comes in, yet I get 5 of them when I run it manually? 

Your spamd either uses different rules or gets a different message (from 
whatever feeds the message to it).

 Why is 
> "autolearn=no" set when "spamd" gets it, but "autolearn=spam" is set 
> when "spamassassin" gets it? 

Because it cannot autolearn a ham message (hits=2.8 !) as spam.


> (Edit: my own server rejected my sending this as-is; it matched the 
> "SENET_DISPNONE" rule on the above text!  Thus proving that it finds 
>  "Display:" and "none" just fine when it's in the body as Plain Text ... 
>  so why doesn't it find them when they're inside HTML?)

I *think* HTML is stripped away before evaluating content, you probably 
have to use a RAWBODY rule or so.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org