You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Alex S Moore <as...@edge.net> on 2005/07/25 14:18:23 UTC

Parsing of uncoded UTF-8 message

I have messages that give this error from sa-learn.  SA is 3.0.4.  Perl 
is 5.8.7.  HTML-Parser is 3.45

Parsing of undecoded UTF-8 will give garbage when decoding entities at 
/opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.

Attached is an example, which is ham.  The HTML::Parser man page says 
something about passing utf8 to p->parse, or some such, but I do not 
understand what this means.  Is there a patch to SA to fix this?

If it matters, here is my locale settings.  I tried with LC_ALL=C and 
that did not help.

[root@mcsrv5 tmp]# locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=

Thanks, Alex

Re: Parsing of uncoded UTF-8 message

Posted by Alex S Moore <as...@edge.net>.
Matt Kettler wrote:
> Alex S Moore wrote:
> 
>>I have messages that give this error from sa-learn.  SA is 3.0.4.  Perl
>>is 5.8.7.  HTML-Parser is 3.45
>>
>>Parsing of undecoded UTF-8 will give garbage when decoding entities at
>>/opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.
>>
>>Attached is an example, which is ham.  The HTML::Parser man page says
>>something about passing utf8 to p->parse, or some such, but I do not
>>understand what this means.  Is there a patch to SA to fix this?
> 
> 
> Yes, there is a patch, but all it does is swallow the message.
> 
> The developers tested and proved this message is not harmful, so you can safely
> ignore it. It's a byproduct of the HTML::Parser code warning about a problem
> that doesn't appear to matter with SA the way SA uses it.
> 

Thanks for the reply Matt.  I just learned another 108 spam messages and 
did not get the 'Parsing of undecoded UTF-8...' message.  So, I will 
just ignore the message the next time that I see it.

Alex

Re: Parsing of uncoded UTF-8 message

Posted by Matt Kettler <mk...@evi-inc.com>.
Alex S Moore wrote:
> I have messages that give this error from sa-learn.  SA is 3.0.4.  Perl
> is 5.8.7.  HTML-Parser is 3.45
> 
> Parsing of undecoded UTF-8 will give garbage when decoding entities at
> /opt/csw/share/perl/csw/Mail/SpamAssassin/HTML.pm line 182.
> 
> Attached is an example, which is ham.  The HTML::Parser man page says
> something about passing utf8 to p->parse, or some such, but I do not
> understand what this means.  Is there a patch to SA to fix this?

Yes, there is a patch, but all it does is swallow the message.

The developers tested and proved this message is not harmful, so you can safely
ignore it. It's a byproduct of the HTML::Parser code warning about a problem
that doesn't appear to matter with SA the way SA uses it.


http://bugzilla.spamassassin.org/show_bug.cgi?id=4046