You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Tim Boyer <ti...@denmantire.com> on 2006/11/01 14:14:39 UTC
Inconsistent scoring
I've been using SA for years. I'm running 3.1.6 on a Red Hat box, and 99%
of the time, all is well.
Last week I added a rule to tag those annoying .gif pump-and-dump emails.
Nothing fancy:
rawbody IMG_SRC_CID /src\=(\"c|c)id\:/i
score IMG_SRC_CID 2.0
Most of the time it works fine. However, occasionally, I'll get an email
that ONLY sees that rule. I'm using MimeDefang to rewrite the headers, and
all it shows is
X-Spam-Score: 2 (**) IMG_SRC_CID
But when I do a spamassassin --debug<test with the message, it finds all
kinds of fun things:
Content analysis details: ( 6.6 points, 9.0 required)
pts rule name description
---- ---------------------- ------------------------------------------------
--
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
1.5 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO
-0.3 BAYES_40 BODY: Bayesian spam probability is 20 to 40%
[score: 0.2631]
1.9 HTML_IMAGE_ONLY_28 BODY: HTML: images with 2400-2800 bytes of words
0.0 HTML_MESSAGE BODY: HTML included in message
1.4 HTML_10_20 BODY: Message is 10% to 20% HTML
0.0 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
2.0 IMG_SRC_CID RAW: cid in body
The very next message is the same kind of scam, but sees everything:
X-Spam-Score: 7.967 (*******)
BAYES_00,DNS_FROM_RFC_ABUSE,FORGED_RCVD_HELO,HTML_
00_10,HTML_MESSAGE,IMG_SRC_CID,MIME_HTML_ONLY,RCVD_NUMERIC_HELO
So what obvious mistake am I making? Thanks for any help...
--
tim boyer
tim@denmantire.com
RE: Inconsistent scoring
Posted by Tim Boyer <ti...@denmantire.com>.
>
> This seems rather odd. I suppose you did lint your rules to
> make sure that you don't have a problem somewhere? It is
> known that SA can do things like dropping most of the rules
> file following a rule with an error in it.
>
Yup; no lint problems at all.
> Maybe you are using Amvis-new or one of the other tools that
> does its own header rewriting in at least some cases?
>
MIMEDefang, but I can't see it doing this.
> I do have a suggestion for improving your rule though. There
> are several things that aren't as efficient as they should
> be. Instead of
>
> > rawbody IMG_SRC_CID /src\=(\"c|c)id\:/i
>
> do
>
> > rawbody IMG_SRC_CID /src="?cid:/i
>
Thanks much - I need all the perl help I can get. :)
-- tim --
RE: Inconsistent scoring
Posted by "John D. Hardin" <jh...@impsec.org>.
On Wed, 1 Nov 2006, Mark wrote:
> > > rawbody IMG_SRC_CID /src\s*=\s*"?cid:/i
>
> Well, that matches newlines, too (really, even without /m). So, you want:
>
> rawbody IMG_SRC_CID /src[ \t]*=[ \t]*"?cid:/i
Why? Newlines there are syntactically valid, are they not?
--
John Hardin KA7OHZ ICQ#15735746 http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
If someone has a gun and is trying to kill you, it would be
reasonable to shoot back with your own gun.
-- the Dalai Lama, May 15, 2001
-----------------------------------------------------------------------
6 days until the campaign ads stop
RE: Inconsistent scoring
Posted by Mark <ad...@asarian-host.net>.
> -----Original Message-----
> From: Loren Wilton [mailto:lwilton@earthlink.net]
> Sent: woensdag 1 november 2006 15:11
> To: users@spamassassin.apache.org
> Subject: Re: Inconsistent scoring
>
>
> Also, while I've never seen it done, I think it is
> theoretically possible to have spaces on either side
> of the equal sign. So the regex really should
> probably be:
>
> > rawbody IMG_SRC_CID /src\s*=\s*"?cid:/i
Well, that matches newlines, too (really, even without /m). So, you want:
rawbody IMG_SRC_CID /src[ \t]*=[ \t]*"?cid:/i
And if we're really nitpicky, we want to match "src" on a boundary:
rawbody IMG_SRC_CID /\bsrc[ \t]*=[ \t]*"?cid:/i
- Mark
Re: Inconsistent scoring
Posted by Loren Wilton <lw...@earthlink.net>.
This seems rather odd. I suppose you did lint your rules to make sure that
you don't have a problem somewhere? It is known that SA can do things like
dropping most of the rules file following a rule with an error in it.
Maybe you are using Amvis-new or one of the other tools that does its own
header rewriting in at least some cases?
I do have a suggestion for improving your rule though. There are several
things that aren't as efficient as they should be. Instead of
> rawbody IMG_SRC_CID /src\=(\"c|c)id\:/i
do
> rawbody IMG_SRC_CID /src="?cid:/i
You don't need the alternation in there, all you really want is an optional
quote mark, and following the quote with a question mark does that. Even if
you needed an alternation, it would be better to use a "non capturing" form
of grouping: (?:blah) rather than just (blah). This reduces the overhead
for perl of saving the string that matches inside the parends in case you
want to use it later in the regex for some reason.
Also, while I've never seen it done, I think it is theoretically possible to
have spaces on either side of the equal sign. So the regex really should
probably be:
> rawbody IMG_SRC_CID /src\s*=\s*"?cid:/i
Loren
Re: Inconsistent scoring
Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, Nov 01, 2006 at 08:14:39AM -0500, Tim Boyer wrote:
> Last week I added a rule to tag those annoying .gif pump-and-dump emails.
> Nothing fancy:
> rawbody IMG_SRC_CID /src\=(\"c|c)id\:/i
There are several issues with this rule IMO, but there's already a very
similar rule available via sa-update:
16.856 20.0630 0.3170 0.984 0.77 1.00 __TVD_INT_CID
which shows that it hits a lot of ham (0.32%), but also hits 20% of spam.
It's good enough for a meta dependency, but not necessarily as a rule for
itself, though YMMV.
--
Randomly Selected Tagline:
"It is sometimes fun to scare people... Especially Matt." - Michelle