You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2018/11/20 20:05:56 UTC

[Bug 7657] New: Certain messages get mangled by double UTF-8 encoding

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

            Bug ID: 7657
           Summary: Certain messages get mangled by double UTF-8 encoding
           Product: Spamassassin
           Version: 3.4.2
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: spamassassin
          Assignee: dev@spamassassin.apache.org
          Reporter: ondrej@caletka.cz
  Target Milestone: Undefined

Created attachment 5628
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5628&action=edit
Minimum working example

I use Debian 9 (Stretch) with spamassassin script run by procmail in the user's
mailbox. Last week, spamassassin got updated from 3.4.1 to 3.4.2 and since
then, some messages are mangled – the message body that was received UTF-8
encoded and transferred in 8bit mode is treated as ISO-8859-1 and reencoded to
UTF-8 again, resulting in totally garbled accented characters.

I'm attaching a minimum working example. To reproduce the issue, one has to put 

add_header all Report _REPORT_

into ~/.spamassassin/user_prefs and call spamassassin < mwe.eml
>mwe-mangled.eml

In the output file, the e-mail body will get double encoded, showing garbage
instead of accented characters.

Disabling Report header insertion works around the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Ondřej Caletka <on...@caletka.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ondrej@caletka.cz

--- Comment #1 from Ondřej Caletka <on...@caletka.cz> ---
Created attachment 5629
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5629&action=edit
Mangled minimum working example

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Henrik Krohns <ap...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ncaq@ncaq.net

--- Comment #11 from Henrik Krohns <ap...@hege.li> ---
*** Bug 7664 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |blocker
   Target Milestone|Undefined                   |3.4.3

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #8 from Henrik Krohns <he...@hege.li> ---
I commented out the utf8::encode and everything works fine. I get no perl
warnings etc, so tt seems all the utf8::encodes should be reverted from that
patch?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |giovanni@paclan.it

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #5 from Henrik Krohns <he...@hege.li> ---
(In reply to Ondřej Caletka from comment #4)
>
> Sorry, it seems that I uploaded the wrong file. The problem is still
> reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9
> stretch.

I have tried now on exact same fresh Debian 9, 3.4.2-1~deb9u1 installation, but
no luck. Only change is the added headers.

$ diff mwe.eml mwe2.eml
1a2,8
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level: **
> X-Spam-Status: No, score=2.0 required=5.0 tests=FOO_1,FOO_2 autolearn=no
>       autolearn_force=no version=3.4.2
> X-Spam-Report:
>       *  1.0 FOO_1 BODY: No description available.
>       *  1.0 FOO_2 No description available.


Could you please list all settings you have changed from default?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Giovanni Bechis <gi...@paclan.it> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Giovanni Bechis <gi...@paclan.it> ---
That was it, fixed in r1861317.
 Thanks

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Ondřej Caletka <on...@caletka.cz> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #5629|0                           |1
        is obsolete|                            |

--- Comment #3 from Ondřej Caletka <on...@caletka.cz> ---
Created attachment 5660
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5660&action=edit
mwe.eml after passing through spamassassin

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #4 from Ondřej Caletka <on...@caletka.cz> ---
(In reply to Henrik Krohns from comment #2)
> Can't reproduce here. Even your "mangled" example attachment body is
> identical to your first attachment!?

Sorry, it seems that I uploaded the wrong file. The problem is still
reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9 stretch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #10 from Ondřej Caletka <on...@caletka.cz> ---
(In reply to Giovanni Bechis from comment #9)
> That was it, fixed in r1861317.
>  Thanks

Thank you very much!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hege@hege.li

--- Comment #2 from Henrik Krohns <he...@hege.li> ---
Can't reproduce here. Even your "mangled" example attachment body is identical
to your first attachment!?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #6 from Henrik Krohns <he...@hege.li> ---
Ah ok, I did manage to reproduce it. But I had to run network tests. Apparently
the extra stuff in this report makes it change the body encoding,
investigating..

$ diff mwe.eml mwe6.eml
1a2,18
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level:
> X-Spam-Status: No, score=0.6 required=5.0 tests=FOO_1,FOO_2,RCVD_IN_DNSWL_MED,
>       SPF_FAIL,SPF_HELO_NONE,TO_EQ_FM_DOM_SPF_FAIL autolearn=no
>       autolearn_force=no version=3.4.2
> X-Spam-Report:
>       * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/,
>       *       medium trust
>       *      [2001:718:1:1:0:0:144:199 listed in]
>       [list.dnswl.org]
>       *  0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
>       *  0.9 SPF_FAIL SPF: sender does not match SPF record (fail)
>       *      [SPF failed: Please see http://www.openspf.org/Why?s=mfrom;id=usera%40example.com;ip=2001%3A718%3A1%3A1%3A%3A144%3A199;r=super.palvel.in]
>       *  1.0 FOO_1 BODY: No description available.
>       *  1.0 FOO_2 No description available.
>       *  0.0 TO_EQ_FM_DOM_SPF_FAIL To domain == From domain and external SPF
>       *       failed
16c33                kÃœ kůŠúpÄ Äábelské ódy.
---ÅíšernÄÅŸluÅ¥ouÄ

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7657] Certain messages get mangled by double UTF-8 encoding

Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657

--- Comment #7 from Henrik Krohns <he...@hege.li> ---
Bug 7305 is the culprit, specifically this part in Revision 1831073
spamassassin.raw:

   # OK, do checks and put out the message.
   my $status = $spamtest->check($mail);
-  print $status->rewrite_mail()  or die "error writing: $!";
+  { my $report = $status->rewrite_mail();
+    # encode Unicode characters to UTF-8 octets
+    utf8::encode($report) if utf8::is_utf8($report);
+    print $report  or die "error writing: $!";
+  }

Maybe Giovanni can chime in what this bit is intended to do. The original bug
only talked about forcing C locale, but these utf8 encodings are other thing?

-- 
You are receiving this mail because:
You are the assignee for the bug.