You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Gary Funck <ga...@intrepid.com> on 2004/01/29 08:14:08 UTC

SA lint: some things to check in current rule set?

I added a -lint option to expand_regex.pl (the tool that expands regex's).
It checks the
following sorts of things:

        .       warn about unescaped '.'s
        ?       warn about '?'s following alphanumeric characters
        (       warn about '('s not written in preferred '(:?' style
        *       warn about '*'s used instead of bounded .{0,n} notation

-----

There are quite a few false positives because the simple-minded regex's
used to scan the rule patterns can't see things like '*' sitting in the
middle of a set notation regex (eg, [+-*/] looks like a bare '*').
Also arguably the use of '*' is OK in URI's and headers because those
strings
are typically bounded.

See attched.

RE: SA lint: some things to check in current rule set?

Posted by Gary Funck <ga...@intrepid.com>.


> From: Daniel Quinlan [mailto:quinlan@pathname.com]
> Sent: Thursday, January 29, 2004 11:01 AM
> 
> Awesome.  Would it be possible to get past all of the false positives?
> We already have too many tests that output oodles of false positives...

Hmmm, I think we might be talking about two different meanings of
"false positive'. What I was trying to say is that the lint script wiil
point to places in the rules that might have problems, but sometimes
either the regex is written as intended, or the script isn't smart enough
to disambiguate certain regex usages.

For example, this might be an intended use of a "bare" '.':
   /V.i.a.g.r.a/i
but this would likely not be:
   /192.68.2.1/
and is better written as /192\.68\.2\.1/




Re: SA lint: some things to check in current rule set?

Posted by Daniel Quinlan <qu...@pathname.com>.
"Gary Funck" <ga...@intrepid.com> writes:

> I added a -lint option to expand_regex.pl (the tool that expands
> regex's).  It checks the following sorts of things:
> 
>         .       warn about unescaped '.'s
>         ?       warn about '?'s following alphanumeric characters
>         (       warn about '('s not written in preferred '(:?' style
>         *       warn about '*'s used instead of bounded .{0,n} notation

Awesome.  Would it be possible to get past all of the false positives?
We already have too many tests that output oodles of false positives...

Daniel