You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2004/12/21 18:03:06 UTC
[Bug 4046] New: Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046
Summary: Parsing of undecoded UTF-8 will give garbage when
decoding entities at [...]/Mail/SpamAssassin/HTML.pm
line 182.
Product: Spamassassin
Version: unspecified
Platform: All
OS/Version: other
Status: NEW
Severity: normal
Priority: P5
Component: Learner
AssignedTo: dev@spamassassin.apache.org
ReportedBy: sebastian@jaenicke.org
The warning above is sometimes issued when sa-learn uses the HTML::Parser
module to parse HTML messages. Happened to me on all 3.0.x versions.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046
------- Additional Comments From bas@debian.org 2004-12-27 06:18 -------
Oops, I was going to attach the same patch Sebastian already had attached ;)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046
------- Additional Comments From sebastian@jaenicke.org 2004-12-21 09:04 -------
Created an attachment (id=2582)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=2582&action=view)
proposed patch
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4046] Parsing of undecoded UTF-8 will give garbage when decoding entities at [...]/Mail/SpamAssassin/HTML.pm line 182.
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046
------- Additional Comments From bas@debian.org 2004-12-27 06:14 -------
Yes, I get lots of these warnings, too, in my nightly mass checks.
It looks like this can be solved by enabling utf8_mode for HTML::Parser in
parse() in Mail/SpamAssassin/HTML.pm. Unfortunately, this is a perl 5.8 option only.
I'll attach a patch.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.