You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Marcio Vogel Merlone dos Santos <ma...@a1.ind.br> on 2019/06/04 19:06:10 UTC

Help matching a spam (regex)

Hi all,

Trying to match a message using uri_detail with no luck. On body I have 
something like this:

<a href="foo.bar">Something &rarr;</a>

That "something" is changed on a daily basis, so I am trying to match 
the &rarr; which is common to all variations, and failing miserably. I 
have tried the obvious and some (desperate) variations:

uri_detail  A1_URI_FAKE_LINK    text =~ /&rarr;/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i

uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i

uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i

What have I missed? Thanks for any enlightenment, RTFM.


Best regards.


-- 
*Marcio Merlone*

Re: Help matching a spam (regex)

Posted by John Hardin <jh...@impsec.org>.

On Tue, 4 Jun 2019, Marcio Vogel Merlone dos Santos wrote:

> Hi all,
>
> Trying to match a message using uri_detail with no luck. On body I have 
> something like this:
>
> <a href="foo.bar">Something &rarr;</a>
>
> That "something" is changed on a daily basis, so I am trying to match the 
> &rarr; which is common to all variations, and failing miserably. I have tried 
> the obvious and some (desperate) variations:
>
> uri_detail  A1_URI_FAKE_LINK    text =~ /&rarr;/i
>
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i
>
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i
>
> uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i
>
> What have I missed? Thanks for any enlightenment, RTFM.

This may help to figure it out in debug mode:

    uri_detail  __ALL_URI_DTL_TXT    text =~ /.*/
    tflags      __ALL_URI_DTL_TXT    multiple

You *should* be able to see exactly what is there - the HTML token or a 
UTF-8 byte sequence.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  2 days until the 75th anniversary of D-Day

Re: Help matching a spam (regex)

Posted by Amir Caspi <ce...@3phase.com>.

On Jun 4, 2019, at 4:05 PM, RW <rw...@googlemail.com> wrote:
> 
> On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote:
> 
>> Trying to match a message using uri_detail with no luck. On body I
>> have something like this:
>> 
>> <a href="foo.bar">Something &rarr;</a>

> &rarr represents a '→' (right arrow) character, IIWY I'd try its
> UTF-8 byte sequence:
> 
> \xe2\x86\x92 

Correct me if I'm wrong, but aren't the HTML entities converted to unicode as part of localize_charset?  In that case, uri_detail would have to be done in unicode as RW suggests... matching &rarr would require a rawbody rule, right?

--- Amir

Re: Help matching a spam (regex)

Posted by RW <rw...@googlemail.com>.

On Tue, 4 Jun 2019 16:06:10 -0300
Marcio Vogel Merlone dos Santos wrote:

> Hi all,
> 
> Trying to match a message using uri_detail with no luck. On body I
> have something like this:
> 
> <a href="foo.bar">Something &rarr;</a>
> 
> That "something" is changed on a daily basis, so I am trying to match 
> the &rarr; which is common to all variations, and failing miserably.
> I have tried the obvious and some (desperate) variations:
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /&rarr;/i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr;/i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /.rarr./i
> 
> uri_detail  A1_URI_FAKE_LINK    text =~ /rarr/i
> 
> What have I missed? Thanks for any enlightenment, RTFM.
> 


&rarr represents a '→' (right arrow) character, IIWY I'd try its
UTF-8 byte sequence:

\xe2\x86\x92