You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/12/21 18:03:06 UTC

[Bug 4046] New: Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.

http://bugzilla.spamassassin.org/show_bug.cgi?id=4046

           Summary: Parsing of undecoded UTF-8 will give garbage when
                    decoding entities at [...]/Mail/SpamAssassin/HTML.pm
                    line 182.
           Product: Spamassassin
           Version: unspecified
          Platform: All
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Learner
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: sebastian@jaenicke.org


The warning above is sometimes issued when sa-learn uses the HTML::Parser
module to parse HTML messages. Happened to me on all 3.0.x versions.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046





------- Additional Comments From bas@debian.org  2004-12-27 06:18 -------
Oops, I was going to attach the same patch Sebastian already had attached ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046





------- Additional Comments From sebastian@jaenicke.org  2004-12-21 09:04 -------
Created an attachment (id=2582)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2582&action=view)
proposed patch




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046





------- Additional Comments From bas@debian.org  2004-12-27 06:14 -------
Yes, I get lots of these warnings, too, in my nightly mass checks.

It looks like this can be solved by enabling utf8_mode for HTML::Parser in
parse() in Mail/SpamAssassin/HTML.pm. Unfortunately, this is a perl 5.8 option only.

I'll attach a patch.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.