You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Terry Carmen <te...@cnysupport.com> on 2011/03/21 18:07:19 UTC
Regex Rule Help?
I'm trying to match any URL that points to a URL shortener.
They typically consist of http(s) followed by a domain name, a slash
and a small series of alphanumeric characters, *without a trailing "/"
or file extension*.
I seem to be having pretty good luck matching the URL, however I can't
figure out how to make the regex explicity *not* match anything that
ends in a slash or contains an extension.
For example, I want to match "http://asdf.ghi/j2kj4l23", but not
"http://asdf.ghi/j2kj4l23/abc.html" or "http://asdf.ghi/j2kj4l23/"
I tried using the perl negative look-ahead as both : (?!/) and (?!\/)
without success.
Can anybody toss me a clue?
Thanks!
Terry
Re: Regex Rule Help?
Posted by Adam Katz <an...@khopis.com>.
On 03/21/2011 10:07 AM, Terry Carmen wrote:
> I'm trying to match any URL that points to a URL shortener.
>
> They typically consist of http(s) followed by a domain name,
> a slash and a small series of alphanumeric characters,
> *without a trailing "/" or file extension*.
>
> I seem to be having pretty good luck matching the URL, however I
> can't figure out how to make the regex explicity *not* match
> anything that ends in a slash or contains an extension.
>
> For example, I want to match "http://asdf.ghi/j2kj4l23", but not
> "http://asdf.ghi/j2kj4l23/abc.html" or "http://asdf.ghi/j2kj4l23/"
In this specific case, I think you want a simple end-of-line indicator,
uri ASDF_GHI_SHORT m'^http://asdf\.ghi/[\w-]{1,12}$'i
In order to match http://asdf.ghi/j2kj4l23#mno you might want:
uri ASDF_GHI_SHORT m'^http://asdf\.ghi/[\w-]{1,12}(?:[^/.\w-]|$)'i
( I used m'' instead of // so I didn't have to escape the slashes. Any
punctuation can be used in that manner, though the leading "m" is only
optional in m// ).
> I tried using the perl negative look-ahead as both : (?!/) and
> (?!\/) without success.
As to using a negative look-ahead operator: Though I'm not exactly sure
about when it's needed, you sometimes have to put something after it,
like /foo(?!bar)(?:.|$)/ ... this is not mentioned in the spec.
Re: Regex Rule Help?
Posted by Bowie Bailey <Bo...@BUC.com>.
On 3/21/2011 1:07 PM, Terry Carmen wrote:
> I'm trying to match any URL that points to a URL shortener.
>
> They typically consist of http(s) followed by a domain name, a slash
> and a small series of alphanumeric characters, *without a trailing "/"
> or file extension*.
>
> I seem to be having pretty good luck matching the URL, however I can't
> figure out how to make the regex explicity *not* match anything that
> ends in a slash or contains an extension.
>
> For example, I want to match "http://asdf.ghi/j2kj4l23", but not
> "http://asdf.ghi/j2kj4l23/abc.html" or "http://asdf.ghi/j2kj4l23/"
>
> I tried using the perl negative look-ahead as both : (?!/) and (?!\/)
> without success.
>
> Can anybody toss me a clue?
Show us your current rule and we can tell you what you are doing wrong.
--
Bowie
Re: Regex Rule Help?
Posted by Martin Gregorie <ma...@gregorie.org>.
On Mon, 2011-03-21 at 13:07 -0400, Terry Carmen wrote:
> I'm trying to match any URL that points to a URL shortener.
>
> They typically consist of http(s) followed by a domain name, a slash
> and a small series of alphanumeric characters, *without a trailing "/"
> or file extension*.
>
> I seem to be having pretty good luck matching the URL, however I can't
> figure out how to make the regex explicity *not* match anything that
> ends in a slash or contains an extension.
>
> For example, I want to match "http://asdf.ghi/j2kj4l23", but not
> "http://asdf.ghi/j2kj4l23/abc.html" or "http://asdf.ghi/j2kj4l23/"
>
> I tried using the perl negative look-ahead as both : (?!/) and (?!\/)
> without success.
>
> Can anybody toss me a clue?
>
Have you looked at the DecodeShortURLs plugin? That would seem to do
what you need *and* check whether the shortened URL points to anything
harmful.
Martin