You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@kluge.net> on 2004/01/30 04:39:25 UTC

Re: svn commit: rev 6356 - incubator/spamassassin/trunk/rules

On Fri, Jan 30, 2004 at 03:04:54AM -0000, jm@apache.org wrote:
> +# "www" hidden as "%77%77%77", "ww%77", etc.
> +rawbody	 T_HTTP_77	/http:\/\/.{0,2}[\%77]/
> +describe T_HTTP_77	Contains a URL-encoded hostname (HTTP77)

Why rawbody as opposed to uri?  Also, [\%77] is character driven which
seems like the wrong thing for this rule...

-- 
Randomly Generated Tagline:
"Low probability events do happen, which is why people still play the lottery."
                                         - Elizabeth Zwicky at LISA '99

Re: svn commit: rev 6356 - incubator/spamassassin/trunk/rules

Posted by Theo Van Dinter <fe...@kluge.net>.
On Thu, Jan 29, 2004 at 09:53:49PM -0800, Justin Mason wrote:
> I'm assuming we're going to fix the URI code to decode them
> correctly eventually ;)  We're missing hits otherwise.

We already do. ;)  I put in some code the other day -- it keeps the
original URI in the list, then it properly escapes the URI and removes
entities for standard ASCII printable chars (33-126).  If the new URI
is different from the old one, the new one is added to the list as well.

ie: 'http://ww%77.kluge.net/' is in the URI list.  the code now goes
through and rewrites it properly as 'http://www.kluge.net/'.  The two
are now different, so the "proper" one gets added.  That way we can match
"raw" or "decoded". ;)

-- 
Randomly Generated Tagline:
"EE good."                  - Prof. Vaz