You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/01/21 03:45:24 UTC

[Bug 5794] URIDetail uses %2E for ','

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5794


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME




------- Additional Comments From sidney@sidney.com  2008-01-20 18:45 -------
%2E is the URLEncoded form of the character '.' URL Encoding is described in
RFC2396, where it says that it is permitted to URL encode characters such as '.'
that are not required to be encoded. Thus, a URI can include the sequence '%2E'
where there is supposed to be a '.' as in the examples in the POD.

It is true that if you pasted the string "http://www.example%2Ecom" into the
address bar of Firefox it would not decode that, as URL encoding is only for the
portion of the URL after the host name. However, SpamAssassin has to be
concerned with the behaviour of MUAs and what spammers do based on that, not
just with RFCs. If an email contains in a plain text message the string
http://www.example%2Ecom then both Thunderbird and Outlook Express will see that
as being a URI, will linkify it, and if it is clicked on will decode it to
http://www.example.com/ and send that to the browser.

Because of that, URIDetail provides the ability to write a rule that has access
to both the encoded and decoded versions of the URI, allowing one to catch
spammers' attempts at obfuscating URIs.

I'm closing this bug report because I don't see a bug being reported here. If
you have more questions about implementation, please ask on the SpamAssassin
developers mailing list.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.