You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@spamassassin.apache.org on 2024/01/03 11:00:48 UTC

[Bug 8206] New: uri_list_canonicalize adds more domains then it should

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206

            Bug ID: 8206
           Summary: uri_list_canonicalize adds more domains then it should
           Product: Spamassassin
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: giovanni@paclan.it
  Target Milestone: Undefined

Created attachment 5930
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5930&action=edit
Sample email

In the attached sample the tag <img src="undefined/favicon.ico"> is wrongly
translated in an http://undefined.com uri.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8206] uri_list_canonicalize adds more domains then it should

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206

Giovanni Bechis <gi...@paclan.it> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |giovanni@paclan.it

--- Comment #1 from Giovanni Bechis <gi...@paclan.it> ---
Created attachment 5931
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5931&action=edit
fix for the issue

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8206] uri_list_canonicalize adds more domains then it should

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8206

Kris Deugau <kd...@vianet.ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kdeugau@vianet.ca

--- Comment #2 from Kris Deugau <kd...@vianet.ca> ---
I don't have any specific examples right at hand, but I've posted on the users
list about essentially this issue in April last year with another specific
case.  See https://lists.apache.org/thread/gf3kyq2y3j1v1lj37g5tpngmk82wgmcz.  I
don't recall if any patches were committed as a result of that thread.

Looking at your patch, I think this is too narrow (even if only because it
omits .png, .webp, and who knows what other image types some sender might use)
and far too late in the process to fix the root cause. There are a long list of
other HTML elements that get filed in the "URI" bin, that can trigger this
problem.  I think to properly solve it, potential URIs from HTML elements need
to be more tightly preprocessed (and discarded) ahead of the rest of the
canonicalization process.

I have docucomments in my local configuration with this:

dbg: uri: canonicalizing html uri: none
dbg: uri: cleaned uri: http://www.none.com
dbg: uri: added host: www.none.com domain: none.com
dbg: uri: cleaned uri: none
dbg: uri: cleaned uri: http://none

(likely from that particular case I posted about)

and:

dbg: uri: canonicalizing html uri: assets/css/styles.css
dbg: uri: cleaned uri: http://www.assets.com/css/styles.css

along with matching uridnsbl_skip_domain entries for

none.com
assets.com
(I also have "none" listed, but that doesn't seem to work to suppress the
entry)

and

background.com
www.com

which latter two I don't have debug detail recorded but which both originated
in essentially the same source - HTML/CSS elements (not content/text!) that
specify a relative URI in some context or form.  None of these were in text
that a mail program *would* often turn into a clickable link.

-- 
You are receiving this mail because:
You are the assignee for the bug.