You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Karsten Bräckelmann <gu...@rudersport.de> on 2009/01/09 00:15:00 UTC

Re: eval function comparing the matches of two regular expression? - AWL is fooled without

Ah, nice... Apparently you sent the very same message here, too. I
already deleted the thread on the users list...


On Thu, 2009-01-08 at 13:24 +0100, Harald Binkle wrote:
> What about a new eval function comparing the matches of two regular
> expression? 
> If there would be a function like 
> 
> eval:Equals(/regex1/,/regex2/) and eval:NOTEquals(/regex1/,/regex2/) 
> 
> it would be easy to define rules like: 
> 
> a rule scoring, say with 0.8 points, if there is only one recipients
> address and
> that one equals the senders address but they have different 'name
> parts'? Like: TO: "User Name" <us...@domain.com> FROM: "viagra offer"
> <us...@domain.com> 

Err, given that example -- what about a rule that punishes any mail sent
"from" your domain with a real name referencing pills? Other options
have been mentioned on the users list already, IIRC.

I seriously wonder *why* these are a problem in the first place. They
are quite spammy, and SA shouldn't have any problem assigning a high
score out of the box.  (If you want help how to better identify a
particular class of spam, provide a link to samples and ask on the users
list. I got a feeling the "To equals From" is just a pattern you
spotted, but there are better ways and other issues.)


> There are a lot of spam mails with that structure trying to get
> through because many people have their own domain on the whitelist.

This is NOT an excuse for implementing such a plugin.

Plain whitelist_from your own domain is a gross mis-configuration. Do
not use it, unless there is no other option. Use the rcvd or auth
variants.


> I tried to set this up as rule but with no luck. I fear it is not
> possible to do this with a regular expression as it is not possible to
> compare results of a regular expression in a regular expression.

It *is* possible to do a generic "To equals From" rule using a single
RE. A few weeks ago when this topic came up, I hacked on this locally
for some exercise. Didn't polish it, but got a proof of concept.
Granted, the RE is quite ugly. :)

Also, it absolutely *is* possible to "compare results of a regular
expression in a regular expression". It's called back references.


> And the AWL does not work for mails with this structure. If the sender
> address was set to the recipients address the AWL is fooled and the
> mail gets a negative score.

No.  AWL does not work on the address alone, but adds the sending IP
block (/24 IIRC) into account. A spam forged to be sent by you does not
get a negative score, because the mail does not originate from the same
network you are using.

The Subject is wrong just the same.


> Could someone implement this?

Such a plugin has been posted to the users list before as a proof of
concept. However, again -- IMHO you are trying to solve your issue by
throwing more code at it, instead of nailing the real problem. Why are
they slipping through in the first place?

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: eval function comparing the matches of two regular expression? - AWL is fooled without

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Mon, 2009-01-19 at 12:59 +0100, Harald Binkle wrote:
> thanks for coming back on this.
> I didn't set the trusted and internal networks yet.

Ah -- that briefly crossed my mind, too, though I didn't think they are
involved at all in AWL and the associated net ranges. Also they tend to
be guessed correctly by default.

> Since your last mail I did this and it seemed to solve the problem as
> far as I can see by now.

Good to hear the problem is fixed. Thanks for the feedback. :)

  guenther

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


RE: [SpamAssassin] RE: [SpamAssassin] Re: eval function comparing the matches of two regular expression? - AWL is fooled without

Posted by Harald Binkle <bi...@jam-software.com>.
Hi,
thanks for coming back on this.
I didn't set the trusted and internal networks yet.
Since your last mail I did this and it seemed to solve the problem as far as I can see by now.

Greetings

Harry

> -----Original Message-----
> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> Sent: Monday, January 19, 2009 12:55 PM
> To: dev@spamassassin.apache.org
> Subject: [SpamAssassin] RE: [SpamAssassin] Re: eval function comparing the
> matches of two regular expression? - AWL is fooled without
>
> On Fri, 2009-01-09 at 08:25 +0100, Harald Binkle wrote:
> > Hi,
> > Here is the header of one of those spam mails coming through:
> >
> > X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) * on
> >         hermes.intranet.jam-software.com * at Wed, 07 Jan 2009 14:56:26 +0100
> > X-Spam-Status: No, hits=2.0, required= 5.0, autolearn=no, shortcircuit=no
> > X-Spam-Report: *  0.3 JAM_DO_STH_HERE BODY: Body contains
> Click/Order/Press... Here
> >         *  0.2 HTML_IMAGE_RATIO_04 BODY: HTML has a low ratio of text to
> image area
> >         *  1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes
> of words
> >         *  0.0 HTML_MESSAGE BODY: HTML included in message
> >         *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
> >         *      [score: 0.9875]
> >         *  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
> >         *  0.9 SARE_UN7 RAW: SARE_UN7
> >         *  0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
> >         *      [41.209.78.136 listed in zen.spamhaus.org]
> >         * -6.3 AWL AWL: From: address is in the auto white-list
>
> > Received: from hacos (41.209.78.136) by Hermes.intranet.jam-software.com
> >  (192.168.123.96) with Microsoft SMTP Server id 8.1.291.1; Wed, 7 Jan 2009
> >  14:55:37 +0100
>
> Assuming that's the IP used for AWL, your AWL database seems to be dirty
> or broken. Unless you actually are physically located in Sudan...
>
>
> > X-Originating-IP: [20.447.77.419]
>
> This is just plain wrong. :)
>
>
> > So as you can see the AWL is the only applied rule which made this spam come
> through.
> > And of cause our own addresses are not on the whitelist.
>
> I guess I'd carefully check the AWL database. Or maybe just start over
> fresh. Any chance of wrong (possibly auto) learned messages?
>
>   guenther
>
>
> --
> char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}




----------------------------------------------------
JAM Software GmbH
Geschäftsführer: Joachim Marder
Max-Planck-Str. 22 * 54296 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen können abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de

RE: [SpamAssassin] Re: eval function comparing the matches of two regular expression? - AWL is fooled without

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2009-01-09 at 08:25 +0100, Harald Binkle wrote:
> Hi,
> Here is the header of one of those spam mails coming through:
> 
> X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) * on
>         hermes.intranet.jam-software.com * at Wed, 07 Jan 2009 14:56:26 +0100
> X-Spam-Status: No, hits=2.0, required= 5.0, autolearn=no, shortcircuit=no
> X-Spam-Report: *  0.3 JAM_DO_STH_HERE BODY: Body contains Click/Order/Press... Here
>         *  0.2 HTML_IMAGE_RATIO_04 BODY: HTML has a low ratio of text to image area
>         *  1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
>         *  0.0 HTML_MESSAGE BODY: HTML included in message
>         *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
>         *      [score: 0.9875]
>         *  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
>         *  0.9 SARE_UN7 RAW: SARE_UN7
>         *  0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
>         *      [41.209.78.136 listed in zen.spamhaus.org]
>         * -6.3 AWL AWL: From: address is in the auto white-list

> Received: from hacos (41.209.78.136) by Hermes.intranet.jam-software.com
>  (192.168.123.96) with Microsoft SMTP Server id 8.1.291.1; Wed, 7 Jan 2009
>  14:55:37 +0100

Assuming that's the IP used for AWL, your AWL database seems to be dirty
or broken. Unless you actually are physically located in Sudan...


> X-Originating-IP: [20.447.77.419]

This is just plain wrong. :)


> So as you can see the AWL is the only applied rule which made this spam come through.
> And of cause our own addresses are not on the whitelist.

I guess I'd carefully check the AWL database. Or maybe just start over
fresh. Any chance of wrong (possibly auto) learned messages?

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


RE: [SpamAssassin] Re: eval function comparing the matches of two regular expression? - AWL is fooled without

Posted by Harald Binkle <bi...@jam-software.com>.
Hi,
Here is the header of one of those spam mails coming through:

X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) * on
        hermes.intranet.jam-software.com * at Wed, 07 Jan 2009 14:56:26 +0100
X-Spam-Status: No, hits=2.0, required= 5.0, autolearn=no, shortcircuit=no
X-Spam-Report: *  0.3 JAM_DO_STH_HERE BODY: Body contains Click/Order/Press... Here
        *  0.2 HTML_IMAGE_RATIO_04 BODY: HTML has a low ratio of text to image area
        *  1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
        *  0.0 HTML_MESSAGE BODY: HTML included in message
        *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
        *      [score: 0.9875]
        *  1.5 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
        *  0.9 SARE_UN7 RAW: SARE_UN7
        *  0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
        *      [41.209.78.136 listed in zen.spamhaus.org]
        * -6.3 AWL AWL: From: address is in the auto white-list
X-Backup: ESTBackupDone
Received: from hacos (41.209.78.136) by Hermes.intranet.jam-software.com
 (192.168.123.96) with Microsoft SMTP Server id 8.1.291.1; Wed, 7 Jan 2009
 14:55:37 +0100
X-Originating-IP: [20.447.77.419]
X-Originating-Email: [marder@jam-software.com]
X-Sender: marder@jam-software.com
Return-Path: marder@jam-software.com
To: <ma...@jam-software.com>
Subject: RE:ci.Doctor Rosen
From: <ma...@jam-software.com>
MIME-Version: 1.0
Importance: High
Content-Type: text/html
Message-ID: <bf...@hermes.intranet.jam-software.com>
Date: Wed, 7 Jan 2009 14:55:37 +0100


So as you can see the AWL is the only applied rule which made this spam come through.
And of cause our own addresses are not on the whitelist.


Harry

> -----Original Message-----
> From: Karsten Bräckelmann [mailto:guenther@rudersport.de]
> Sent: Friday, January 09, 2009 12:15 AM
> To: dev@spamassassin.apache.org
> Cc: Harald Binkle
> Subject: [SpamAssassin] Re: eval function comparing the matches of two regular
> expression? - AWL is fooled without
>
> Ah, nice... Apparently you sent the very same message here, too. I
> already deleted the thread on the users list...
>
>
> On Thu, 2009-01-08 at 13:24 +0100, Harald Binkle wrote:
> > What about a new eval function comparing the matches of two regular
> > expression?
> > If there would be a function like
> >
> > eval:Equals(/regex1/,/regex2/) and eval:NOTEquals(/regex1/,/regex2/)
> >
> > it would be easy to define rules like:
> >
> > a rule scoring, say with 0.8 points, if there is only one recipients
> > address and
> > that one equals the senders address but they have different 'name
> > parts'? Like: TO: "User Name" <us...@domain.com> FROM: "viagra offer"
> > <us...@domain.com>
>
> Err, given that example -- what about a rule that punishes any mail sent
> "from" your domain with a real name referencing pills? Other options
> have been mentioned on the users list already, IIRC.
>
> I seriously wonder *why* these are a problem in the first place. They
> are quite spammy, and SA shouldn't have any problem assigning a high
> score out of the box.  (If you want help how to better identify a
> particular class of spam, provide a link to samples and ask on the users
> list. I got a feeling the "To equals From" is just a pattern you
> spotted, but there are better ways and other issues.)
>
>
> > There are a lot of spam mails with that structure trying to get
> > through because many people have their own domain on the whitelist.
>
> This is NOT an excuse for implementing such a plugin.
>
> Plain whitelist_from your own domain is a gross mis-configuration. Do
> not use it, unless there is no other option. Use the rcvd or auth
> variants.
>
>
> > I tried to set this up as rule but with no luck. I fear it is not
> > possible to do this with a regular expression as it is not possible to
> > compare results of a regular expression in a regular expression.
>
> It *is* possible to do a generic "To equals From" rule using a single
> RE. A few weeks ago when this topic came up, I hacked on this locally
> for some exercise. Didn't polish it, but got a proof of concept.
> Granted, the RE is quite ugly. :)
>
> Also, it absolutely *is* possible to "compare results of a regular
> expression in a regular expression". It's called back references.
>
>
> > And the AWL does not work for mails with this structure. If the sender
> > address was set to the recipients address the AWL is fooled and the
> > mail gets a negative score.
>
> No.  AWL does not work on the address alone, but adds the sending IP
> block (/24 IIRC) into account. A spam forged to be sent by you does not
> get a negative score, because the mail does not originate from the same
> network you are using.
>
> The Subject is wrong just the same.
>
>
> > Could someone implement this?
>
> Such a plugin has been posted to the users list before as a proof of
> concept. However, again -- IMHO you are trying to solve your issue by
> throwing more code at it, instead of nailing the real problem. Why are
> they slipping through in the first place?
>
>   guenther
>
>
> --
> char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
> main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
> (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}




----------------------------------------------------
JAM Software GmbH
Geschäftsführer: Joachim Marder
Max-Planck-Str. 22 * 54296 Trier * Germany
Tel: 0700-70707050 * Fax: 0700-70707059
(max. 12,4 ct/min, Preise aus Mobilfunknetzen können abweichen)
Handelsregister Nr. HRB 4920 (AG Wittlich)  http://www.jam-software.de