You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/01/30 06:53:49 UTC

Re: svn commit: rev 6356 - incubator/spamassassin/trunk/rules

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Theo Van Dinter writes:
>On Fri, Jan 30, 2004 at 03:04:54AM -0000, jm@apache.org wrote:
>> +# "www" hidden as "%77%77%77", "ww%77", etc.
>> +rawbody	 T_HTTP_77	/http:\/\/.{0,2}[\%77]/
>> +describe T_HTTP_77	Contains a URL-encoded hostname (HTTP77)
>
>Why rawbody as opposed to uri?

I'm assuming we're going to fix the URI code to decode them
correctly eventually ;)  We're missing hits otherwise.

> Also, [\%77] is character driven which
>seems like the wrong thing for this rule...

oops, good point. fixed,

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAGfFtQTcbUG5Y7woRAo0bAJ0eM802sFRjt0qWW+8RJjsn6ZfgcACfUXJF
71ElpXXvafWCItHgTijctFE=
=O27s
-----END PGP SIGNATURE-----


Re: svn commit: rev 6356 - incubator/spamassassin/trunk/rules

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Jan 29, 2004 at 09:53:49PM -0800, Justin Mason wrote:
> I'm assuming we're going to fix the URI code to decode them
> correctly eventually ;)  We're missing hits otherwise.

We already do. ;)  I put in some code the other day -- it keeps the
original URI in the list, then it properly escapes the URI and removes
entities for standard ASCII printable chars (33-126).  If the new URI
is different from the old one, the new one is added to the list as well.

ie: 'http://ww%77.kluge.net/' is in the URI list.  the code now goes
through and rewrites it properly as 'http://www.kluge.net/'.  The two
are now different, so the "proper" one gets added.  That way we can match
"raw" or "decoded". ;)

-- 
Randomly Generated Tagline:
"EE good."                  - Prof. Vaz