You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Marcio Vogel Merlone dos Santos <ma...@a1.ind.br> on 2019/06/04 19:06:10 UTC
Help matching a spam (regex)
Hi all,
Trying to match a message using uri_detail with no luck. On body I have
something like this:
<a href="foo.bar">Something →</a>
That "something" is changed on a daily basis, so I am trying to match
the → which is common to all variations, and failing miserably. I
have tried the obvious and some (desperate) variations:
uri_detail A1_URI_FAKE_LINK text =~ /→/i
uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i
uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i
uri_detail A1_URI_FAKE_LINK text =~ /rarr/i
What have I missed? Thanks for any enlightenment, RTFM.
Best regards.
--
*Marcio Merlone*
Re: Help matching a spam (regex)
Posted by John Hardin <jh...@impsec.org>.
On Tue, 4 Jun 2019, Marcio Vogel Merlone dos Santos wrote:
> Hi all,
>
> Trying to match a message using uri_detail with no luck. On body I have
> something like this:
>
> <a href="foo.bar">Something →</a>
>
> That "something" is changed on a daily basis, so I am trying to match the
> → which is common to all variations, and failing miserably. I have tried
> the obvious and some (desperate) variations:
>
> uri_detail A1_URI_FAKE_LINK text =~ /→/i
>
> uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i
>
> uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i
>
> uri_detail A1_URI_FAKE_LINK text =~ /rarr/i
>
> What have I missed? Thanks for any enlightenment, RTFM.
This may help to figure it out in debug mode:
uri_detail __ALL_URI_DTL_TXT text =~ /.*/
tflags __ALL_URI_DTL_TXT multiple
You *should* be able to see exactly what is there - the HTML token or a
UTF-8 byte sequence.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
2 days until the 75th anniversary of D-Day
Re: Help matching a spam (regex)
Posted by Amir Caspi <ce...@3phase.com>.
On Jun 4, 2019, at 4:05 PM, RW <rw...@googlemail.com> wrote:
>
> On Tue, 4 Jun 2019 16:06:10 -0300 Marcio Vogel Merlone dos Santos wrote:
>
>> Trying to match a message using uri_detail with no luck. On body I
>> have something like this:
>>
>> <a href="foo.bar">Something →</a>
> &rarr represents a '→' (right arrow) character, IIWY I'd try its
> UTF-8 byte sequence:
>
> \xe2\x86\x92
Correct me if I'm wrong, but aren't the HTML entities converted to unicode as part of localize_charset? In that case, uri_detail would have to be done in unicode as RW suggests... matching &rarr would require a rawbody rule, right?
--- Amir
Re: Help matching a spam (regex)
Posted by RW <rw...@googlemail.com>.
On Tue, 4 Jun 2019 16:06:10 -0300
Marcio Vogel Merlone dos Santos wrote:
> Hi all,
>
> Trying to match a message using uri_detail with no luck. On body I
> have something like this:
>
> <a href="foo.bar">Something →</a>
>
> That "something" is changed on a daily basis, so I am trying to match
> the → which is common to all variations, and failing miserably.
> I have tried the obvious and some (desperate) variations:
>
> uri_detail A1_URI_FAKE_LINK text =~ /→/i
>
> uri_detail A1_URI_FAKE_LINK text =~ /.rarr;/i
>
> uri_detail A1_URI_FAKE_LINK text =~ /.rarr./i
>
> uri_detail A1_URI_FAKE_LINK text =~ /rarr/i
>
> What have I missed? Thanks for any enlightenment, RTFM.
>
&rarr represents a '→' (right arrow) character, IIWY I'd try its
UTF-8 byte sequence:
\xe2\x86\x92