You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/01/09 21:40:22 UTC
[Bug 6042] New: Malformed UTF-8 character
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
Summary: Malformed UTF-8 character
Product: Spamassassin
Version: 3.2.5
Platform: Other
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P3
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: eddy.beliveau@hec.ca
Hi!
We are using Postfix 2.5.5 on our RHEL AS release 4 (Nahant Update 6)
academic server.
amavisd-new-2.6.2 (20081215), Unicode aware, LANG="fr"
with spamassassin 3.2.5, perl 5.8.5
Our log file contains many:
amavis[19738]: (19738-05) _WARN: Malformed UTF-8 character (unexpected
continuation byte 0x8e, with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU2, line 1, <GEN16> line 3620.
That rule is version 01.03.13
A Perl bug report has been reported yesterday by Mark Martinec:
perlbug: [perl #62048]
Unwarranted "Malformed UTF-8 character" on tainted variable
See sample attached, it gives me:
[root@smtpext2 log]# spamassassin </tmp/9854.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x81,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU4, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x81,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU4, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU6, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU6, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU5, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU5, line 1.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
jidanni@jidanni.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jidanni@jidanni.org
--- Comment #2 from jidanni@jidanni.org 2010-09-23 08:12:49 UTC ---
Help, I get warn: Malformed UTF-8 character (unexpected continuation byte 0xac,
with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
And that rule is not meant to be UTF-8 at all. That rule is
body J_BODY_US_BIG5
/\xBFn\xA5\xFD\xA5\xCD|\xA4\xA6(\xA5\xA7|\xA5\xFD\xA5\xCD)|\xAC\xD5\xA5\xC9|\xA4G\xAB\xD7\xA4\xC0\xB1a|\xAA\xEA\xA4l\xA4s|\xBD\xBA\xB6\xE9/
SpamAssassin 3.4.0-r905379-1797 (2010-02-02)
Suddenly after years i get this error here on Debian Sid.
See also Bug #4791
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
--- Comment #6 from jidanni@jidanni.org 2010-11-18 20:24:50 UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> OK, thanks. I'll be back if it screws up.
Which it now does.
With
body J_BODY_US_BIG5
/[\xBF]n[\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|[\xA4]G[\xAB][\xD7][\xA4][\xC0][\xB1]a|[\xAA][\xEA][\xA4]l[\xA4]s|[\xBD][\xBA][\xB6][\xE9]/
I get
Nov 19 09:23:06.671 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xbd, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xaa, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xac, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xbd, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
etc.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
--- Comment #1 from Eddy Beliveau <ed...@hec.ca> 2009-01-09 12:44:03 PST ---
Created an attachment (id=4414)
--> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4414)
spamassassin sample vs "Malformed UTF-8 character"
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
John Hardin <jh...@impsec.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jhardin@impsec.org
--- Comment #3 from John Hardin <jh...@impsec.org> 2010-09-25 14:15:30 UTC ---
(In reply to comment #2)
> Help, I get warn: Malformed UTF-8 character (unexpected continuation byte 0xac,
> with no preceding start byte) in pattern match (m//) at
> /home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
> And that rule is not meant to be UTF-8 at all. That rule is
> body J_BODY_US_BIG5
> /\xBFn\xA5\xFD\xA5\xCD|\xA4\xA6(\xA5\xA7|\xA5\xFD\xA5\xCD)|\xAC\xD5\xA5\xC9|\xA4G\xAB\xD7\xA4\xC0\xB1a|\xAA\xEA\xA4l\xA4s|\xBD\xBA\xB6\xE9/
Perl can sometimes get confused by REs like that, and it's not consistent
either.
The safest thing to do when coding strings of 8-bit characters like that is to
enclose each character in a run in square brackets to make it a character
class. This prevents Perl from trying to interpret pairs as a UTF-8 character.
For example:
body J_BODY_US_BIG5
/[\xBFn][\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|\xA4G[\xAB][\xD7][\xA4][\xC0][\xB1a]|[\xAA][\xEA][\xA4l][\xA4s]|[\xBD][\xBA][\xB6][\xE9]/
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
--- Comment #4 from John Hardin <jh...@impsec.org> 2010-09-25 14:55:19 UTC ---
(In reply to comment #3)
Oops, misplaced the brackets on a few of those...
body J_BODY_US_BIG5
/\xBFn[\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|\xA4G[\xAB][\xD7][\xA4][\xC0][\xB1]a|[\xAA][\xEA][\xA4]l[\xA4]s|[\xBD][\xBA][\xB6][\xE9]/
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6042] Malformed UTF-8 character
Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042
--- Comment #5 from jidanni@jidanni.org 2010-09-26 02:09:31 UTC ---
(In reply to comment #4)
OK, thanks. I'll be back if it screws up.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.