You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chris Conn <cc...@abacom.com> on 2010/06/01 17:56:02 UTC
Malformed UTF-8 character
Hello,
I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I
occasionally get this error;
Malformed UTF-8 character (unexpected non-continuation byte 0x00,
immediately after start byte 0xc3) in pattern match (m//) at
/var/lib/spamassassin/3.003001/updates_spamassassin_org/72_active.cf,
rule __HUSH_HUSH, line 1, <GEN272> line 528.
Malformed UTF-8 character (unexpected non-continuation byte 0x00,
immediately after start byte 0xe9) in pattern match (m//) at
/var/lib/spamassassin/3.003001/updates_spamassassin_org/72_active.cf,
rule __HUSH_HUSH, line 1, <GEN24> line 462.
I built a rpm using rpmbuild on the system in question, is my
installation broken? I have found similar instances in previous versions
http://wiki.apache.org/spamassassin/RedHatMalformedUtf8
http://www.gossamer-threads.com/lists/spamassassin/users/100450
mostly old stuff.
What can I check to correct this?
Thanks in advance,
C.
Re: Malformed UTF-8 character
Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2010-06-01 at 12:31 -0400, Michael Scheidell wrote:
> I believe the minimum recommended is 5.8.8 with 5.10.1 STRONGLY
> recommended.
>
> there was even talk on this list of disabling anything user 5.8.8, or
> strongly warning against it.
>
> (and I think there was some talk about requiring 5.10.1.)
No, this definitely has never been considered.
Since 3.3.0 the SA team dropped /official/ support for Perl 5.6. That
means, we do not guarantee it will continue to work with 5.6, which it
currently does. We are even open to get Perl 5.6 specific patches in, if
provided by the community. However, we are unlikely to fix any issues
with 5.6 ourself.
The discussion and decision to drop official Perl 5.6 support was hard
enough already.
--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Malformed UTF-8 character
Posted by Michael Scheidell <sc...@secnap.net>.
On 6/1/10 11:56 AM, Chris Conn wrote:
> Hello,
>
> I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I
> occasionally get this error;
>
I believe the minimum recommended is 5.8.8 with 5.10.1 STRONGLY
recommended.
there was even talk on this list of disabling anything user 5.8.8, or
strongly warning against it.
(and I think there was some talk about requiring 5.10.1.)
--
Michael Scheidell, CTO
Phone: 561-999-5000, x 1259
> *| *SECNAP Network Security Corporation
* Certified SNORT Integrator
* 2008-9 Hot Company Award Winner, World Executive Alliance
* Five-Star Partner Program 2009, VARBusiness
* Best Anti-Spam Product 2008, Network Products Guide
* King of Spam Filters, SC Magazine 2008
______________________________________________________________________
This email has been scanned and certified safe by SpammerTrap(r).
For Information please see http://www.secnap.com/products/spammertrap/
______________________________________________________________________
Re: Malformed UTF-8 character
Posted by John Hardin <jh...@impsec.org>.
On Tue, 1 Jun 2010, Chris Conn wrote:
> John Hardin wrote:
>> On Tue, 1 Jun 2010, Chris Conn wrote:
>>
>> > I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I
>> > occasionally get this error;
>> >
>> > Malformed UTF-8 character (unexpected non-continuation byte 0x00,
>> > immediately after start byte 0xc3) in pattern match (m//) at
>> > /var/lib/spamassassin/3.003001/updates_spamassassin_org/72_active.cf,
>> > rule __HUSH_HUSH, line 1, <GEN272> line 528.
>> >
>> > What can I check to correct this?
>>
>> I'll fix that, thanks for mentioning it.
>>
>> SA is somewhat inconsistent about whether or not it complains about
>> malformed UTF-8 characters, as illustrated by your only occasionally
>> getting that error. I get no complaints about that rule here when testing
>> my sandbox...
>
> Hopefully its the regexp that can be modified and not that it will
> consistently error-out on my few RH4/CentOS4 boxes I run ;) RH
> maintains the same version for the entire life of the distro for
> dependancies so upgrading out of RedHat is most often painful.
Yes, it's a fairly simple modification to the regex that contains the
UTF-8 multibyte character sequence. Perl is just getting confused handling
it properly when the byte sequence is bare (e.g. \xc3\xa9) so making it a
sequence of one-character character sets ([\xc3][\xa9]) fixes that problem
without materially altering the RE.
I had to fix this for _some_ of the UTF-8 sequences here, but others were
being handled properly so I was lazy and didn't change them all. For that
I apologize.
I've committed the fix, it will go out with the next sa-update.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
North Korea: the only country in the world where people would risk
execution to flee to communist China. -- Ride Fast
-----------------------------------------------------------------------
5 days until the 66th anniversary of D-Day
Re: Malformed UTF-8 character
Posted by Chris Conn <cc...@abacom.com>.
John Hardin wrote:
> On Tue, 1 Jun 2010, Chris Conn wrote:
>
>> I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I
>> occasionally get this error;
>>
>> Malformed UTF-8 character (unexpected non-continuation byte 0x00,
>> immediately after start byte 0xc3) in pattern match (m//) at
>> /var/lib/spamassassin/3.003001/updates_spamassassin_org/72_active.cf,
>> rule __HUSH_HUSH, line 1, <GEN272> line 528.
>>
>> What can I check to correct this?
>
> I'll fix that, thanks for mentioning it.
>
> SA is somewhat inconsistent about whether or not it complains about
> malformed UTF-8 characters, as illustrated by your only occasionally
> getting that error. I get no complaints about that rule here when
> testing my sandbox...
>
Hello,
Hopefully its the regexp that can be modified and not that it will
consistently error-out on my few RH4/CentOS4 boxes I run ;) RH
maintains the same version for the entire life of the distro for
dependancies so upgrading out of RedHat is most often painful.
Thanks again,
C.
Re: Malformed UTF-8 character
Posted by John Hardin <jh...@impsec.org>.
On Tue, 1 Jun 2010, Chris Conn wrote:
> I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I occasionally
> get this error;
>
> Malformed UTF-8 character (unexpected non-continuation byte 0x00, immediately
> after start byte 0xc3) in pattern match (m//) at
> /var/lib/spamassassin/3.003001/updates_spamassassin_org/72_active.cf, rule
> __HUSH_HUSH, line 1, <GEN272> line 528.
>
> What can I check to correct this?
I'll fix that, thanks for mentioning it.
SA is somewhat inconsistent about whether or not it complains about
malformed UTF-8 characters, as illustrated by your only occasionally
getting that error. I get no complaints about that rule here when
testing my sandbox...
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
Re: Malformed UTF-8 character
Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Tue, 2010-06-01 at 11:56 -0400, Chris Conn wrote:
> I upgraded to SA 3.3.1 on a CentOS system using Perl 5.8.5 and I
> occasionally get this error;
> I built a rpm using rpmbuild on the system in question, is my
> installation broken? I have found similar instances in previous versions
>
> http://wiki.apache.org/spamassassin/RedHatMalformedUtf8
> http://www.gossamer-threads.com/lists/spamassassin/users/100450
>
> mostly old stuff.
Without thoroughly checking the details...
Yes, mostly old stuff. Just like your Perl version. ;) Both references
point at issues with Perl handling UTF-8 in 5.8.x versions. Since your
5.8.5 is quite old, and there even have been a couple later 5.8.x
releases -- any chance that's it?
--
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}