You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/01/10 20:52:49 UTC

[Bug 5292] New: URIDNSBL erroneously matches substrings of words with accents

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292

           Summary: URIDNSBL erroneously matches substrings of words with
                    accents
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Plugins
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: mlathoud@b2b2c.ca


using URIDNSBL,
the string Cin�ma.ca matches domain ma.ca.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From mlathoud@b2b2c.ca  2007-01-12 16:05 -------
(In reply to comment #5)
> What this comes down to is:
> 
> - do we limit ourselves to www.* strings?
> - do we limit ourselves to old-school TLDs? (.com, .net, etc.)
> - do we limit ourselves to obvious URLs only (https?://)
> - do we not bother and have people complain that we don't catch this type of
thing?

A balance of each? catching http://anything or www.old-schooltld and validating
that there is no starting part ending the previous line (in case of improper
line jump) would be nice.
I hate spam as much as you guys but false positives are also very annoying.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From felicity@apache.org  2007-01-10 12:13 -------
fwiw, this doesn't really have anything to do with uridnsbl.  it's all about the
uri text parser looking for raw domains.  uridnsbl would take that information
and query for it, but it's not involved in actual parsing.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From jm@jmason.org  2007-01-10 12:05 -------
does 'Cin�ma.ca' work as a link in any mail user-agent?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From felicity@apache.org  2007-01-12 14:59 -------
(In reply to comment #4)
> a weird one:
> "He told me.well.in fact, he didn't say much" matches well.in (why is the html
> code containing me...well... but the dots vanished from the txt version? a bug
> in hotmail composer I suppose).
> For the latest case what about validating www. when there's no http:// ? I don't
> think MUAs make links of non http nor www. uris.

The issue is spam that says "type gotofoo.com in your browser" ... we're trying
to catch that.

What this comes down to is:

- do we limit ourselves to www.* strings?
- do we limit ourselves to old-school TLDs? (.com, .net, etc.)
- do we limit ourselves to obvious URLs only (https?://)
- do we not bother and have people complain that we don't catch this type of thing?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From mlathoud@b2b2c.ca  2007-01-12 09:04 -------
a few other FPs:
"QuébecRencontres.com" matches becrencontres.com
a weird one:
"He told me.well.in fact, he didn't say much" matches well.in (why is the html
code containing me...well... but the dots vanished from the txt version? a bug
in hotmail composer I suppose).
For the latest case what about validating www. when there's no http:// ? I don't
think MUAs make links of non http nor www. uris.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5292] URIDNSBL erroneously matches substrings of words with accents

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5292





------- Additional Comments From mlathoud@b2b2c.ca  2007-01-10 12:25 -------
I let you guys specifiy the appropriate component to report the bug on
(spamassassin?). About Cin�ma.ca being a link in MUAs, not in Thunderbird.
Outlook makes a link if it is prefixed by http[s]. While I'm at it (could be
another bug or feature request), urls of the form http://www.startof
adomain.com
shouldn't match adomain.com as it does now. It's causing quite a few FP with
multi.uribl.com.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.