You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dennis Hardy <dh...@sogetthis.com> on 2008/12/08 16:54:58 UTC

need help with spamassassin URI rule

Hi, I was hoping someone on this list could help me with a custom rule for
SpamAssassin.  I'm not an expert at perl regexps by at all, and spent a lot
of time trying to come up with a working match, all to no avail...

What I would like to match on is URLs that do _not_ start with a third level
domain entry, and end with ".com", ".biz", ".info", etc.  For example,
"http://hello.com/" (followed by more stuff) would match, and
"http://www.hello.com/{...}" would _not_ match.

Actually another way of looking at it is just matching on a single domain,
without any preceding ".", so basically "//domain.ext/" is what I want to
match for, and if there is a preceding "." in front of "domain", that would
cause it to not match.  So "http://foo.bar.net/" would not match, but
"http://bar.net/" would.  Is this possible with perl regexps?

I've spent hours trying variations of different URI rules, but none of them
work (they always match the "www." as well).  Here are some of my feeble
attempts:

    [^w]{3}.*\.com\/
    ^(?:http?:\/\/)?[^\/]+(?<!\/www)\.[^.]{7,}\.com\/
    (?<!www\.)   ...
    [^\/]+(?<!\/www)\.{1,}\.com\/

Some of the "dot only" checks I tried:

    (?<!\.)\w+?\.com
    ([^\.])\w+.*\.com\/

Again none of these work :-(

I really appreciate any any help you could provide!

.dh


-- 
View this message in context: http://www.nabble.com/need-help-with-spamassassin-URI-rule-tp20897907p20897907.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: need help with spamassassin URI rule

Posted by Ned Slider <ne...@unixmail.co.uk>.
Ned Slider wrote:
> Henrik K wrote:
>>
>> To be more specific:
>>
>> Hostname may end optionally to a dot, with :port, /slash or nothing 
>> following
>>
>> m{^https?://[^.:/]+\.[^.:/]+\.?(?:$|[:/])}
>>
>>
> 
> Could anyone please provide a reference or explanation of the use of 
> m{blah} in spamassassin uri rules?
> 
> Thanks
> 

Answering my own question courtesy of a previous post (below) by Matt 
Kettler:

[quote]
They are the delimiters. Instead of using a pair of / to delimit the 
regex they used curly braces. It's somewhat rare to see this done, but 
it's sometimes convenient.

When you prefix with the match operator (that m at the beginning) you 
can use almost any character you want as a delimiter instead of forward 
slash. This way you can do http:// without having to escape it as 
http:\/\/ like you would in most normal / delimited rule.
[/quote]

Thanks Matt!




Re: need help with spamassassin URI rule

Posted by Ned Slider <ne...@unixmail.co.uk>.
Henrik K wrote:
> 
> To be more specific:
> 
> Hostname may end optionally to a dot, with :port, /slash or nothing following
> 
> m{^https?://[^.:/]+\.[^.:/]+\.?(?:$|[:/])}
> 
> 

Could anyone please provide a reference or explanation of the use of 
m{blah} in spamassassin uri rules?

Thanks




Re: need help with spamassassin URI rule

Posted by Henrik K <he...@hege.li>.
On Mon, Dec 08, 2008 at 08:52:46AM -0800, John Hardin wrote:
> On Mon, 8 Dec 2008, Dennis Hardy wrote:
>
>> What I would like to match on is URLs that do _not_ start with a third level
>> domain entry, and end with ".com", ".biz", ".info", etc.  For example,
>> "http://hello.com/" (followed by more stuff) would match, and
>> "http://www.hello.com/{...}" would _not_ match.
>>
>> Some of the "dot only" checks I tried:
>>
>>    (?<!\.)\w+?\.com
>>    ([^\.])\w+.*\.com\/
>>
>> Again none of these work :-(
>
> How about:
>   /:\/\/[^.\/]+\.[^\.\/]+\//

To be more specific:

Hostname may end optionally to a dot, with :port, /slash or nothing following

m{^https?://[^.:/]+\.[^.:/]+\.?(?:$|[:/])}


Re: need help with spamassassin URI rule

Posted by John Hardin <jh...@impsec.org>.
On Mon, 8 Dec 2008, Dennis Hardy wrote:

>
>> How about:
>>    /:\/\/[^.\/]+\.[^\.\/]+\//
>
> Hi John, sweet, this seems to work!  Could you help me with how to add a
> list of "com|net|info|biz|etc" before the closing "/", so it will match
> against a list of known TLDs?

    /:\/\/[^.\/]+\.(?:com|net|info|biz|etc)\//

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It is not the business of government to make men virtuous or
   religious, or to preserve the fool from the consequences of his own
   folly.                                              -- Henry George
-----------------------------------------------------------------------
  7 days until Bill of Rights day

Re: need help with spamassassin URI rule

Posted by Dennis Hardy <dh...@sogetthis.com>.
> How about:
>    /:\/\/[^.\/]+\.[^\.\/]+\//

Hi John, sweet, this seems to work!  Could you help me with how to add a
list of "com|net|info|biz|etc" before the closing "/", so it will match
against a list of known TLDs?

Many thanks, you are awesome :-)

.dh


-- 
View this message in context: http://www.nabble.com/need-help-with-spamassassin-URI-rule-tp20897907p20899285.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.


Re: need help with spamassassin URI rule

Posted by John Hardin <jh...@impsec.org>.
On Mon, 8 Dec 2008, Dennis Hardy wrote:

> What I would like to match on is URLs that do _not_ start with a third level
> domain entry, and end with ".com", ".biz", ".info", etc.  For example,
> "http://hello.com/" (followed by more stuff) would match, and
> "http://www.hello.com/{...}" would _not_ match.
>
> Some of the "dot only" checks I tried:
>
>    (?<!\.)\w+?\.com
>    ([^\.])\w+.*\.com\/
>
> Again none of these work :-(

How about:
   /:\/\/[^.\/]+\.[^\.\/]+\//

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Perfect Security and Absolute Safety are unattainable; beware
   those who would try to sell them to you, regardless of the cost,
   for they are trying to sell you your own slavery.
-----------------------------------------------------------------------
  7 days until Bill of Rights day