You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/02/03 05:20:13 UTC

[Bug 5320] New: Unobfuscate URI hosts in Util::uri_list_canonify()

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5320

           Summary: Unobfuscate URI hosts in Util::uri_list_canonify()
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: felicity@apache.org


I'm really tired of this stuff, so we should better clean up things like the
host from a URI so more domains can be checked.  Things we don't currently deal
with:

exam*ple.com
example*com
example.*com

and multiple versions thereof.  I was originally going to put this near the end:

      if ($host =~ tr/0-9A-Za-z._-//cd) {
        push(@nuris, join ('', $proto, $host, $rest));
      }

which handles some of this stuff, but looking at rfc 3986 section 3.2.2 all
about the host section of the uri, it's not that easy due to things like IPv6,
etc.  We'd also need some heuristic to deal with things like "." replacement
instead of just character removal.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5320] Unobfuscate URI hosts in Util::uri_list_canonify()

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5320





------- Additional Comments From maddoc@maddoc.net  2007-02-03 04:19 -------
I'm all for this, but also be aware that there are now ones using:

domain%.com 

You know this would mutate to every possible non-alphanumeric.

But still I think it would be worth while.






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5320] Unobfuscate URI hosts in Util::uri_list_canonify()

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5320





------- Additional Comments From felicity@apache.org  2007-02-03 07:38 -------
(In reply to comment #1)
> You know this would mutate to every possible non-alphanumeric.

Right, I was using '*' as an example meaning "some set of one or more
non-allowed characters".  :)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5320] Unobfuscate URI hosts in Util::uri_list_canonify()

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5320





------- Additional Comments From felicity@apache.org  2007-02-04 18:22 -------
Created an attachment (id=3854)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3854&action=view)
futzing around




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.