You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by martin smith <ma...@ntlworld.com> on 2005/06/12 13:58:35 UTC

Uri rules

Has the behaviour of the uri rule been changed at some point to match the
whole of the URL? I have just noticed I am getting some FP when one of my
uri rules matches against the URL rather than URI.
To prevent FP would be very difficult, I think to match the whole of the URL
with uri rules is not such a good thing, if you wanted to match something in
a URL it would be quite easy to do so in a body rule but to match just
against URI isn't so easy.

Martin

Re[2]: Uri rules

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello Keith,

Thanks!  Did those off the top of my head, just before going to bed,
and thought I had forgotten something.  I believe your additions are
correct, and probably better than what we're actually using.

Bob Menschel

Wednesday, June 15, 2005, 7:51:52 PM, you wrote:

KI> Robert Menschel wrote:

>> SARE has been playing around with URI rules lately, and when we need
>> to test for something in the host/domain area, we use something like:
>> 
>>>uri  rule_name  m'(?:https?://)?[^/]*testgoeshere'

KI> I think you want a ^ at the start of your regex.  As it is, that will
KI> match any string with "testgoeshere" anywhere in it, since the match can
KI> start at the "t" rather than at the beginning of the string.

>> In other words, the test must precede any/all slashes except for those
>> that might be within http://
>> 
>> When we need to test for something after the host/domain area, we
>> reverse that, like:
>> 
>>>uri rule_name m'(?:https?://)?.+/testgoeshere'

KI> For this I think you want something like

KI>     uri rule_name m'^(?:https?://)?[^/]+/.*testgoeshere'

KI> Without the ^ anchor and without changing . to [^/], you'll match things
KI> like 'http://testgoeshere.invalid/path', because the 'http://' part of
KI> the regex is optional.  And without adding .* after the slash, you won't
KI> match things like 'http://example.com/atestgoeshere' -- but maybe you're
KI> not wanting to.

Re: Uri rules

Posted by Keith Ivey <kc...@cpcug.org>.

Robert Menschel wrote:

> SARE has been playing around with URI rules lately, and when we need
> to test for something in the host/domain area, we use something like:
> 
>>uri  rule_name  m'(?:https?://)?[^/]*testgoeshere'

I think you want a ^ at the start of your regex.  As it is, that will 
match any string with "testgoeshere" anywhere in it, since the match can 
start at the "t" rather than at the beginning of the string.

> In other words, the test must precede any/all slashes except for those
> that might be within http://
> 
> When we need to test for something after the host/domain area, we
> reverse that, like:
> 
>>uri rule_name m'(?:https?://)?.+/testgoeshere'

For this I think you want something like

    uri rule_name m'^(?:https?://)?[^/]+/.*testgoeshere'

Without the ^ anchor and without changing . to [^/], you'll match things 
like 'http://testgoeshere.invalid/path', because the 'http://' part of 
the regex is optional.  And without adding .* after the slash, you won't 
match things like 'http://example.com/atestgoeshere' -- but maybe you're 
not wanting to.

-- 
Keith C. Ivey <kc...@cpcug.org>
Washington, DC

RE: Re[2]: Uri rules

Posted by martin smith <ma...@ntlworld.com>.

M>Hello martin,
M>
M>
M>SARE has been playing around with URI rules lately, and when 
M>we need to test for something in the host/domain area, we use 
M>something like:
M>> uri  rule_name  m'(?:https?://)?[^/]*testgoeshere'
M>In other words, the test must precede any/all slashes except 
M>for those that might be within http://
M>
M>When we need to test for something after the host/domain 
M>area, we reverse that, like:
M>> uri rule_name m'(?:https?://)?.+/testgoeshere'
M>In other words, the test must follow a slash.
M>
M>The method can be improved upon, but it helps avoid what I 
M>think are the false hits you're dealing with.
M>
M>Bob Menschel
M>

Thanks for the tip Bob, that's just what I needed, I will look at trying
that out but pretty sure that will stop the FP that I was having.

Regards Martin

Re[2]: Uri rules

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello martin,

Monday, June 13, 2005, 11:37:03 PM, you wrote:

ms> Thanks for the reply Bob, it's a rule of my own, and yes I was using the
ms> same definition of URL and URI, I just didn't notice any FP when I first
ms> wrote it but wasn't sure if the uri rule behaviour had changed, since its
ms> not a url rule that is.

SARE has been playing around with URI rules lately, and when we need
to test for something in the host/domain area, we use something like:
> uri  rule_name  m'(?:https?://)?[^/]*testgoeshere'
In other words, the test must precede any/all slashes except for those
that might be within http://

When we need to test for something after the host/domain area, we
reverse that, like:
> uri rule_name m'(?:https?://)?.+/testgoeshere'
In other words, the test must follow a slash.

The method can be improved upon, but it helps avoid what I think are
the false hits you're dealing with.

Bob Menschel

RE: Uri rules

Posted by martin smith <ma...@ntlworld.com>.

M>Not that I'm aware of.  To my knowledge the URI rule always 
M>matches the full URL.  There are several SA and/or SARe rules 
M>which depend upon this.
M>
M>Or do you mean something different by URI and URL than I do.  
M>I generally use the definitions found at 
M>http://www.adp-gmbh.ch/web/uri_url_urn.html -- including:
M>>  URI = Uniform Resource Identifier
M>> There are two types of URIs: URLs and URNs
M>In other words, a URL /is/ a URI.
M>
M>Section 1.3 of http://www.zvon.org/tmRFC/RFC2396/Output/ 
M>gives as examples of URIs:
M>> http://www.math.uio.no/faq/compression-faq/part1.html
M>> mailto:mduerst@ifi.unizh.ch
M>(those are the two most applicable to SA)
M>> ftp://ftp.is.co.za/rfc/rfc1808.txt
M>etc.
M>
M>
M>Why?  As recommended, if you have an avoidable FP in an SA 
M>distribution rule, post it to bugzilla, and we'll see if we 
M>can get rid of the FP.  (Remember, however, that sometimes 
M>ham-hits on low-scoring rules are intentionally -- an FP is 
M>one that flags a non-spam as a spam.)
M>
M>If your ham hit is in a SARE rule rather than an SA rule 
M>(more likely, IMO), then post the specifics either here or on 
M>the SARE forum, and we'll see if it's worth avoiding.
M>
M>Bob Menschel
M>

Thanks for the reply Bob, it's a rule of my own, and yes I was using the
same definition of URL and URI, I just didn't notice any FP when I first
wrote it but wasn't sure if the uri rule behaviour had changed, since its
not a url rule that is.


Martin

Re: Uri rules

Posted by Robert Menschel <Ro...@Menschel.net>.

Hello martin,

Sunday, June 12, 2005, 4:58:35 AM, you wrote:

ms> Has the behaviour of the uri rule been changed at some point to match the
ms> whole of the URL? I have just noticed I am getting some FP when one of my
ms> uri rules matches against the URL rather than URI.

Not that I'm aware of.  To my knowledge the URI rule always matches
the full URL.  There are several SA and/or SARe rules which depend upon this.

Or do you mean something different by URI and URL than I do.  I
generally use the definitions found at
http://www.adp-gmbh.ch/web/uri_url_urn.html -- including:
>  URI = Uniform Resource Identifier
> There are two types of URIs: URLs and URNs
In other words, a URL /is/ a URI.

Section 1.3 of http://www.zvon.org/tmRFC/RFC2396/Output/ gives as
examples of URIs:
> http://www.math.uio.no/faq/compression-faq/part1.html
> mailto:mduerst@ifi.unizh.ch
(those are the two most applicable to SA)
> ftp://ftp.is.co.za/rfc/rfc1808.txt
etc.

ms> To prevent FP would be very difficult, I think to match the whole of the URL
ms> with uri rules is not such a good thing, if you wanted to match something in
ms> a URL it would be quite easy to do so in a body rule but to match just
ms> against URI isn't so easy.

Why?  As recommended, if you have an avoidable FP in an SA
distribution rule, post it to bugzilla, and we'll see if we can get
rid of the FP.  (Remember, however, that sometimes ham-hits on
low-scoring rules are intentionally -- an FP is one that flags a
non-spam as a spam.)

If your ham hit is in a SARE rule rather than an SA rule (more likely,
IMO), then post the specifics either here or on the SARE forum, and
we'll see if it's worth avoiding.

Bob Menschel

RE: Uri rules

Posted by Bret Miller <br...@wcg.org>.

> Has the behaviour of the uri rule been changed at some point 
> to match the
> whole of the URL? I have just noticed I am getting some FP 
> when one of my
> uri rules matches against the URL rather than URI.
> To prevent FP would be very difficult, I think to match the 
> whole of the URL
> with uri rules is not such a good thing, if you wanted to 
> match something in
> a URL it would be quite easy to do so in a body rule but to match just
> against URI isn't so easy.

This would probably be best handled by filing a bug report at
http://bugzilla.spamassassin.org/. You should be specific about your
rule and what was hit that shouldn't be, probably attaching a sample of
the message if possible.

Bret