You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Niamh Holding <ni...@fullbore.co.uk> on 2011/04/22 17:07:31 UTC

Spamassassin regex oddity

Hello

I have a custom rule-

header   NH_TDIALIN X-Spam-Relays-Untrusted =~ /^[^\]]+ rdns=.*dip\.t-dialin\.net/i
score    NH_TDIALIN 1.61
describe NH_TDIALIN Received directly from dynamic t-dialin.net address

Now the regex should only match on anything in the first [...] block
of the X-Spam-Relays-Untrusted header (Note the ^[^\]]+ part of the
pattern; that skips anything but "]" characters, ensuring that the
match will only happen within the first [...] block of the
pseudo-header string. so says the page at
http://wiki.apache.org/spamassassin/TrustedRelays), however the rule
is being triggered by this-

[ ip=212.227.17.8 rdns=moutng.kundenserver.de helo=moutng.kundenserver.de 
by=mail.redbus.holtain.net ident= envfrom= intl=0 id= auth= msa=0 ] [ 
ip=84.165.216.65 rdns=p54A5D841.dip.t-dialin.net helo=labsco.de 
by=mrelayeu.kundenserver.de ident= envfrom= intl=0 
id=0MLOoU-1QCibS31jm-000eF3 auth= msa=0 ]

where the t-dialin.net rdns is clearly in the second [...] block and indeed 
if I put the regex and the header in to my regex tester I get no match as 
expected.

So why does Spamassassin find a match?


-- 
Best regards,
 Niamh                          mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Friday, April 22, 2011, 7:09:06 PM, you wrote:

KB> Not permitting square brackets will indeed prevent a relay border
KB> between the rdns= key and the matched value, but since spaces are
KB> allowed, it happily matches e.g. helo= or by= values.

That did click eventually    :)

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2011-04-22 at 18:55 +0100, Niamh Holding wrote:
> KB> What you want instead is to match anything BUT a space /[^ ]+/. That
> KB> will prevent this part from matching beyond borders. More specifically,
> KB> it prevents matching any other data point and ensures the right hand
> KB> part actually is the rDNS as desired.
> 
> I think what I want is actually-
> /rdns=[^\]]*\dip\.t-dialin\.net/i

No.

Not permitting square brackets will indeed prevent a relay border
between the rdns= key and the matched value, but since spaces are
allowed, it happily matches e.g. helo= or by= values.

The Relay pseudo-headers are a list of key=value pairs per relay,
delimited by spaces. There is no space in the key=value pair, ever. So,
to ensure matching the key's value (and not possibly other values later
in the relay entry), a wildcard matching must exclude spaces.


> Which will stop trying to match at the first ]
> 
> I think

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Friday, April 22, 2011, 4:31:25 PM, you wrote:

KB> What you want instead is to match anything BUT a space /[^ ]+/. That
KB> will prevent this part from matching beyond borders. More specifically,
KB> it prevents matching any other data point and ensures the right hand
KB> part actually is the rDNS as desired.

I think what I want is actually-
/rdns=[^\]]*\dip\.t-dialin\.net/i

Which will stop trying to match at the first ]

I think

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Martin Gregorie <ma...@gregorie.org>.
On Sat, 2011-04-23 at 18:49 +0100, Niamh Holding wrote:
> Hello Karsten,
> 
> Saturday, April 23, 2011, 6:23:51 PM, you wrote:
> 
> KB> Besides, X-Spam-Relays-* are pseudo-headers, not part of the
> KB> email unless you specifically add_header them.
> 
> I guess I must have done that to get them into every email :)
> 

Try these (best suggestion first)

1) Install a copy of Spamassassin on a development host, edit the
   rule you want to test in /etc/mail/spamassassin/local.cf and run:

   spamassassin <test_message.txt

   Benefits: you're testing the rule you'll run in the environment it
   will run in, can also test meta-rules and can easily test against a 
   real mail message. If you organise things correctly, you can keep
   all SA settings and local rules on the development machine and export
   them to the production server via a script and/or scp. 

   I work this way and IME the time needed to set it up and write
   scripts to make rule development and testing simple is time well
   spent.

2) Try this online tester: http://www.solmetra.com/scripts/regex/
   At least it defaults to PERL regexes, though it won't swallow a
   whole mail message easily.

3) grep -P 'regex to test' <test_message.txt

   Benefits: easy to test against a real mail message. 
   Runs on your own box.

   Demerit: the -P option is described as 'experimental' and may not
   yet implement all PCRE features.



Martin
 


Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sat, 2011-04-23 at 18:49 +0100, Niamh Holding wrote:
> KB> Besides, X-Spam-Relays-* are pseudo-headers, not part of the
> KB> email unless you specifically add_header them.
> 
> I guess I must have done that to get them into every email :)

Oh, you really got that from the mail's headers? Yeah, then your site
config should have some lines along

  add_header all  Relays-Untrusted  _RELAYSUNTRUSTED_

and its variations. Adding them always probably is just wasting headers,
though. It's perfectly fine to enable it temporarily while chasing some
issues, or just generate it on demand when debugging or developing rules
like the one in your OP. You do not need to add_header them, to have the
rules working -- these pseudo-headers are always available to rules as
metadata, without effectively duplicating each and every Received
header.

To get these pseudo-headers on demand for rule development or debugging
your trustpath, just feed a sample through 'spamassassin -D' and grep
the STDERR output for the headers.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Saturday, April 23, 2011, 6:23:51 PM, you wrote:

KB> Besides, X-Spam-Relays-* are pseudo-headers, not part of the
KB> email unless you specifically add_header them.

I guess I must have done that to get them into every email :)

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sat, 2011-04-23 at 17:49 +0100, Niamh Holding wrote:
> KB> Other than the tool being broken, it is of course entirely possible you
> KB> simply typo'ed either the RE or the Relay pseudo-header -- a newline
> KB> easily would have done that.
> 
> Cut'n'paste from locals.cf and the relevant header from an email in
> each case.

Exactly. Line breaks love to sneak in during copy-n-paste of long
lines. :)  Besides, X-Spam-Relays-* are pseudo-headers, not part of the
email unless you specifically add_header them.


> KB> Perl.
> 
> Doesn't give me a text box to enter the text to test and another one
> to enter the regex being tested with an output showing the match or
> that there is no match.

Don't think "text box", think "file" or "variable" instead.

If you keep on searching for a GUI tool, keep in mind you want one with
the tasty PCRE flavor of regex, not a cheap imitation with additives.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Saturday, April 23, 2011, 5:30:58 PM, you wrote:

KB> Other than the tool being broken, it is of course entirely possible you
KB> simply typo'ed either the RE or the Relay pseudo-header -- a newline
KB> easily would have done that.

Cut'n'paste from locals.cf and the relevant header from an email in
each case.

KB> Perl.

Doesn't give me a text box to enter the text to test and another one
to enter the regex being tested with an output showing the match or
that there is no match.

Something like  http://www.lumadis.be/regex/test_regex.php but running
locally.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Sat, 2011-04-23 at 11:05 +0100, Niamh Holding wrote:
> KB> The regex tester is broken.

To be honest, it is not necessarily broken. I don't even know which tool
you used.

That comment should be understood as an emphasis of my previous detailed
explanation of the RE and the issues with it. Point is, the RE should
match exactly like SA did, despite the regex tester tool confirming your
expectation of the contrary.

Other than the tool being broken, it is of course entirely possible you
simply typo'ed either the RE or the Relay pseudo-header -- a newline
easily would have done that.


> Anyone care to suggest a good tester that runs locally under XP

Perl.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Friday, April 22, 2011, 4:31:25 PM, you wrote:

KB> The regex tester is broken.

Anyone care to suggest a good tester that runs locally under XP

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Michael,

Friday, April 22, 2011, 5:55:49 PM, you wrote:

MS> how about 'msa=0 \] \[ ip=.*rdns=.*dip\.t-dialin\.net/i'

But that'll definitely match into the second block, what I need is to
end the matching at the end of the first block ie the first ]

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Michael Scheidell <mi...@secnap.com>.
On 4/22/11 12:51 PM, Niamh Holding wrote:
> Hello Karsten,
>
> Friday, April 22, 2011, 4:31:25 PM, you wrote:
>
> KB>  What you want instead is to match anything BUT a space /[^ ]+/. That
> KB>  will prevent this part from matching beyond borders. More specifically,
> KB>  it prevents matching any other data point and ensures the right hand
> KB>  part actually is the rDNS as desired.
>
> Well it happily appears to go beyond borders according to the tester at
> http://www.spaweditor.com/scripts/regex/index.php
>
> The regex
> /[^ ]+ rdns=.*dip\.t-dialin\.net/i
>
> matches with
> ip=212.227.17.8 rdns=moutng.kundenserver.de helo=moutng.kundenserver.de by=mail.redbus.holtain.net ident= envfrom= intl=0 id= auth= msa=0 ] [ ip=84.165.216.65 rdns=p54A5D841.dip.t-dialin.net
>
how about 'msa=0 \] \[ ip=.*rdns=.*dip\.t-dialin\.net/i'

(and do you need the /i? isn't it expensive?)



-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 >*| *SECNAP Network Security Corporation

    * Best Intrusion Prevention Product, Networks Product Guide
    * Certified SNORT Integrator
    * Hot Company Award, World Executive Alliance
    * Best in Email Security, 2010 Network Products Guide
    * King of Spam Filters, SC Magazine

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________  

Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Friday, April 22, 2011, 6:53:28 PM, you wrote:

KB> /^[^\]]+ rdns=[^ ]+\.no\.space\.there /

Ah I see what you meant now, and that wouldn't match where the
dip.tdialin.net was in the helo and not the rdns.

Thanks.

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2011-04-22 at 17:51 +0100, Niamh Holding wrote:
> KB> What you want instead is to match anything BUT a space /[^ ]+/. That
> KB> will prevent this part from matching beyond borders. More specifically,
> KB> it prevents matching any other data point and ensures the right hand
> KB> part actually is the rDNS as desired.
> 
> Well it happily appears to go beyond borders [...]
> 
> The regex
> /[^ ]+ rdns=.*dip\.t-dialin\.net/i

Please re-read my previous post, you just changed the wrong part.

The leading /^[^\]]+ rdns=/ was correct, as you even cited yourself from
the docs. It is bound to the beginning of the string, and ensures there
is no closing square bracket (a relay border) before the rdns value.

The /.*/ regex part you are using is wrong. It permits everything,
*including* whitespace and relay borders. Between the rdns key and
value, that's where no space must be allowed.

  /^[^\]]+ rdns=[^ ]+\.no\.space\.there /


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Niamh Holding <ni...@fullbore.co.uk>.
Hello Karsten,

Friday, April 22, 2011, 4:31:25 PM, you wrote:

KB> What you want instead is to match anything BUT a space /[^ ]+/. That
KB> will prevent this part from matching beyond borders. More specifically,
KB> it prevents matching any other data point and ensures the right hand
KB> part actually is the rDNS as desired.

Well it happily appears to go beyond borders according to the tester at
http://www.spaweditor.com/scripts/regex/index.php

The regex
/[^ ]+ rdns=.*dip\.t-dialin\.net/i

matches with
ip=212.227.17.8 rdns=moutng.kundenserver.de helo=moutng.kundenserver.de by=mail.redbus.holtain.net ident= envfrom= intl=0 id= auth= msa=0 ] [ ip=84.165.216.65 rdns=p54A5D841.dip.t-dialin.net

-- 
Best regards,
 Niamh                            mailto:niamh@fullbore.co.uk

Re: Spamassassin regex oddity

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2011-04-22 at 16:07 +0100, Niamh Holding wrote:
> I have a custom rule-
> 
> header   NH_TDIALIN X-Spam-Relays-Untrusted =~ /^[^\]]+ rdns=.*dip\.t-dialin\.net/i
                                                               ^^
The rdns= part correctly is in the first block. The above marked pattern
to match any host runs wild, though -- it reads "any char, any number of
times" an thus happily consumes everything, including the square bracket
hop borders.

What you want instead is to match anything BUT a space /[^ ]+/. That
will prevent this part from matching beyond borders. More specifically,
it prevents matching any other data point and ensures the right hand
part actually is the rDNS as desired.


> describe NH_TDIALIN Received directly from dynamic t-dialin.net address
> 
> Now the regex should only match on anything in the first [...] block
> of the X-Spam-Relays-Untrusted header (Note the ^[^\]]+ part of the
> pattern; that skips anything but "]" characters, ensuring that the
> match will only happen within the first [...] block of the
> pseudo-header string. so says the page at
> http://wiki.apache.org/spamassassin/TrustedRelays), however the rule
> is being triggered by this-
> 
> [ ip=212.227.17.8 rdns=moutng.kundenserver.de helo=moutng.kundenserver.de 
> by=mail.redbus.holtain.net ident= envfrom= intl=0 id= auth= msa=0 ] [ 
> ip=84.165.216.65 rdns=p54A5D841.dip.t-dialin.net helo=labsco.de 
> by=mrelayeu.kundenserver.de ident= envfrom= intl=0 
> id=0MLOoU-1QCibS31jm-000eF3 auth= msa=0 ]
> 
> where the t-dialin.net rdns is clearly in the second [...] block and indeed 
> if I put the regex and the header in to my regex tester I get no match as 
> expected.

The regex tester is broken.

> So why does Spamassassin find a match?

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: Spamassassin regex oddity

Posted by Michael Scheidell <mi...@secnap.com>.
On 4/22/11 11:07 AM, Niamh Holding wrote:
> Hello
>
> I have a custom rule-
>
> header   NH_TDIALIN X-Spam-Relays-Untrusted =~ /^[^\]]+ rdns=.*dip\.t-dialin\.net/i
> score    NH_TDIALIN 1.61
> describe NH_TDIALIN Received directly from dynamic t-dialin.net address
>
postfix? #1, just reject at mta.!
if not, then have postfix check server, and insert a header.
then let SA score that custom header.

(friday is NOT a good day to match wits with regex)


-- 
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
 >*| *SECNAP Network Security Corporation

    * Best Intrusion Prevention Product, Networks Product Guide
    * Certified SNORT Integrator
    * Hot Company Award, World Executive Alliance
    * Best in Email Security, 2010 Network Products Guide
    * King of Spam Filters, SC Magazine

______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________