You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/10/09 19:14:08 UTC
[Bug 6219] New: URI detectiong issues since the great URI detection
reform of Jan 2008
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6219
Summary: URI detectiong issues since the great URI detection
reform of Jan 2008
Product: Spamassassin
Version: 3.3.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P5
Component: Libraries
AssignedTo: dev@spamassassin.apache.org
ReportedBy: julian@mehnle.net
Keywords: URI URL detection parse parsing
The great URL detection reform by Sidney Markowitz in January 2008 (r616097)
vastly improved things, but it also broke a few URL forms that were previously
detected and it detects other forms that aren't legal URLs.
For example, scheme-less URLs enclosed in parentheses are no longer detected:
(example.com)
(example.com/foo)
Enclosing brackets, angle brackets, or curly braces don't pose a problem.
Then, URLs with oddly-cased "http", "https", "ftp" schemes such as the
following are not detected:
Http://www.example.com
HTTP://www.example.com
ftP://ftp.example.com
(FWIW, this had been fixed once before the great reform per bug 4111 but
apparently no test case had been added.)
Then, URLs in the "known-scheme" category (cf. the regexps used in
PerMsgStatus.pm), i.e., ones starting with, e.g., "http:" or "www.", are
detected even if their domain name(!) (not URL path or query string) contains
extended characters such as "(" or ")":
www.example(.com) --> www.example(.com
http://example(.com) --> http://example(.com
Obviously those aren't linkified by most MUAs, and even if they were they
wouldn't lead the user anywhere.
Finally, bare e-mail addresses starting with "www."(!) are misdetected as
"http:" URLs and *not* "mailto:" ones:
www.x@example.com --> http://www.x@example.com --> http://example.com
I guess the $uriknownscheme regexp should be split into
$uri(really)knownscheme" and $uriassumedscheme, and the latter should be
deprioritized below/after $urimailscheme in the definition of $tbirdurire.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.