You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex S Moore <as...@edge.net> on 2005/07/25 14:18:23 UTC
Parsing of uncoded UTF-8 message
I have messages that give this error from sa-learn. SA is 3.0.4. Perl
is 5.8.7. HTML-Parser is 3.45
Parsing of undecoded UTF-8 will give garbage when decoding entities at
/opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.
Attached is an example, which is ham. The HTML::Parser man page says
something about passing utf8 to p->parse, or some such, but I do not
understand what this means. Is there a patch to SA to fix this?
If it matters, here is my locale settings. I tried with LC_ALL=C and
that did not help.
[root@mcsrv5 tmp]# locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
Thanks, Alex
Re: Parsing of uncoded UTF-8 message
Posted by Alex S Moore <as...@edge.net>.
Matt Kettler wrote:
> Alex S Moore wrote:
>
>>I have messages that give this error from sa-learn. SA is 3.0.4. Perl
>>is 5.8.7. HTML-Parser is 3.45
>>
>>Parsing of undecoded UTF-8 will give garbage when decoding entities at
>>/opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.
>>
>>Attached is an example, which is ham. The HTML::Parser man page says
>>something about passing utf8 to p->parse, or some such, but I do not
>>understand what this means. Is there a patch to SA to fix this?
>
>
> Yes, there is a patch, but all it does is swallow the message.
>
> The developers tested and proved this message is not harmful, so you can safely
> ignore it. It's a byproduct of the HTML::Parser code warning about a problem
> that doesn't appear to matter with SA the way SA uses it.
>
Thanks for the reply Matt. I just learned another 108 spam messages and
did not get the 'Parsing of undecoded UTF-8...' message. So, I will
just ignore the message the next time that I see it.
Alex
Re: Parsing of uncoded UTF-8 message
Posted by Matt Kettler <mk...@evi-inc.com>.
Alex S Moore wrote:
> I have messages that give this error from sa-learn. SA is 3.0.4. Perl
> is 5.8.7. HTML-Parser is 3.45
>
> Parsing of undecoded UTF-8 will give garbage when decoding entities at
> /opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.
>
> Attached is an example, which is ham. The HTML::Parser man page says
> something about passing utf8 to p->parse, or some such, but I do not
> understand what this means. Is there a patch to SA to fix this?
Yes, there is a patch, but all it does is swallow the message.
The developers tested and proved this message is not harmful, so you can safely
ignore it. It's a byproduct of the HTML::Parser code warning about a problem
that doesn't appear to matter with SA the way SA uses it.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4046