You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2009/01/09 21:40:22 UTC

[Bug 6042] New: Malformed UTF-8 character

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

           Summary: Malformed UTF-8 character
           Product: Spamassassin
           Version: 3.2.5
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Rules
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: eddy.beliveau@hec.ca


Hi!

We are using Postfix 2.5.5 on our RHEL AS release 4 (Nahant Update 6)
academic server.

amavisd-new-2.6.2 (20081215), Unicode aware, LANG="fr"
with spamassassin 3.2.5, perl 5.8.5

Our log file contains many:
amavis[19738]: (19738-05) _WARN: Malformed UTF-8 character (unexpected
continuation byte 0x8e, with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU2, line 1, <GEN16> line 3620.

That rule is version 01.03.13 

A Perl bug report has been reported yesterday by Mark Martinec:
  perlbug: [perl #62048]
  Unwarranted "Malformed UTF-8 character" on tainted variable

See sample attached, it gives me:
[root@smtpext2 log]# spamassassin </tmp/9854.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x81,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU4, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x81,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU4, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU6, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU6, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU5, line 1.
[11014] warn: Malformed UTF-8 character (unexpected continuation byte 0x82,
with no preceding start byte) in pattern match (m//) at
/var/lib/spamassassin/3.002005/70_sare_specific_cf_sare_sa-update_dostech_net/200605280300.cf,
rule SARE_SPEC_REPL_OBFU5, line 1.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

jidanni@jidanni.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jidanni@jidanni.org

--- Comment #2 from jidanni@jidanni.org 2010-09-23 08:12:49 UTC ---
Help, I get warn: Malformed UTF-8 character (unexpected continuation byte 0xac,
with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
And that rule is not meant to be UTF-8 at all. That rule is
body J_BODY_US_BIG5
/\xBFn\xA5\xFD\xA5\xCD|\xA4\xA6(\xA5\xA7|\xA5\xFD\xA5\xCD)|\xAC\xD5\xA5\xC9|\xA4G\xAB\xD7\xA4\xC0\xB1a|\xAA\xEA\xA4l\xA4s|\xBD\xBA\xB6\xE9/
SpamAssassin 3.4.0-r905379-1797 (2010-02-02)
Suddenly after years i get this error here on Debian Sid.
See also Bug #4791

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

--- Comment #6 from jidanni@jidanni.org 2010-11-18 20:24:50 UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> OK, thanks. I'll be back if it screws up.

Which it now does.

With
body J_BODY_US_BIG5
/[\xBF]n[\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|[\xA4]G[\xAB][\xD7][\xA4][\xC0][\xB1]a|[\xAA][\xEA][\xA4]l[\xA4]s|[\xBD][\xBA][\xB6][\xE9]/

I get
Nov 19 09:23:06.671 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xbd, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xaa, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xac, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
Nov 19 09:23:06.672 [21608] warn: Malformed UTF-8 character (unexpected
continuation byte 0xbd, with no preceding start byte) in pattern match (m//) at
/home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.

etc.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042





--- Comment #1 from Eddy Beliveau <ed...@hec.ca>  2009-01-09 12:44:03 PST ---
Created an attachment (id=4414)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4414)
spamassassin sample vs "Malformed UTF-8 character"


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

John Hardin <jh...@impsec.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jhardin@impsec.org

--- Comment #3 from John Hardin <jh...@impsec.org> 2010-09-25 14:15:30 UTC ---
(In reply to comment #2)
> Help, I get warn: Malformed UTF-8 character (unexpected continuation byte 0xac,
> with no preceding start byte) in pattern match (m//) at
> /home/jidanni/.spamassassin/user_prefs, rule J_BODY_US_BIG5, line 1.
> And that rule is not meant to be UTF-8 at all. That rule is
> body J_BODY_US_BIG5
> /\xBFn\xA5\xFD\xA5\xCD|\xA4\xA6(\xA5\xA7|\xA5\xFD\xA5\xCD)|\xAC\xD5\xA5\xC9|\xA4G\xAB\xD7\xA4\xC0\xB1a|\xAA\xEA\xA4l\xA4s|\xBD\xBA\xB6\xE9/

Perl can sometimes get confused by REs like that, and it's not consistent
either.

The safest thing to do when coding strings of 8-bit characters like that is to
enclose each character in a run in square brackets to make it a character
class. This prevents Perl from trying to interpret pairs as a UTF-8 character.
For example:


body J_BODY_US_BIG5
/[\xBFn][\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|\xA4G[\xAB][\xD7][\xA4][\xC0][\xB1a]|[\xAA][\xEA][\xA4l][\xA4s]|[\xBD][\xBA][\xB6][\xE9]/

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

--- Comment #4 from John Hardin <jh...@impsec.org> 2010-09-25 14:55:19 UTC ---
(In reply to comment #3)

Oops, misplaced the brackets on a few of those... 

body J_BODY_US_BIG5
/\xBFn[\xA5][\xFD][\xA5][\xCD]|[\xA4][\xA6](?:[\xA5][\xA7]|[\xA5][\xFD][\xA5][\xCD])|[\xAC][\xD5][\xA5][\xC9]|\xA4G[\xAB][\xD7][\xA4][\xC0][\xB1]a|[\xAA][\xEA][\xA4]l[\xA4]s|[\xBD][\xBA][\xB6][\xE9]/

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6042] Malformed UTF-8 character

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6042

--- Comment #5 from jidanni@jidanni.org 2010-09-26 02:09:31 UTC ---
(In reply to comment #4)
OK, thanks. I'll be back if it screws up.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.