You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2018/11/20 20:05:56 UTC
[Bug 7657] New: Certain messages get mangled by double UTF-8
encoding
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Bug ID: 7657
Summary: Certain messages get mangled by double UTF-8 encoding
Product: Spamassassin
Version: 3.4.2
Hardware: PC
OS: Linux
Status: NEW
Severity: major
Priority: P2
Component: spamassassin
Assignee: dev@spamassassin.apache.org
Reporter: ondrej@caletka.cz
Target Milestone: Undefined
Created attachment 5628
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5628&action=edit
Minimum working example
I use Debian 9 (Stretch) with spamassassin script run by procmail in the user's
mailbox. Last week, spamassassin got updated from 3.4.1 to 3.4.2 and since
then, some messages are mangled – the message body that was received UTF-8
encoded and transferred in 8bit mode is treated as ISO-8859-1 and reencoded to
UTF-8 again, resulting in totally garbled accented characters.
I'm attaching a minimum working example. To reproduce the issue, one has to put
add_header all Report _REPORT_
into ~/.spamassassin/user_prefs and call spamassassin < mwe.eml
>mwe-mangled.eml
In the output file, the e-mail body will get double encoded, showing garbage
instead of accented characters.
Disabling Report header insertion works around the issue.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Ondřej Caletka <on...@caletka.cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ondrej@caletka.cz
--- Comment #1 from Ondřej Caletka <on...@caletka.cz> ---
Created attachment 5629
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5629&action=edit
Mangled minimum working example
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Henrik Krohns <ap...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ncaq@ncaq.net
--- Comment #11 from Henrik Krohns <ap...@hege.li> ---
*** Bug 7664 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Henrik Krohns <he...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|major |blocker
Target Milestone|Undefined |3.4.3
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #8 from Henrik Krohns <he...@hege.li> ---
I commented out the utf8::encode and everything works fine. I get no perl
warnings etc, so tt seems all the utf8::encodes should be reverted from that
patch?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Henrik Krohns <he...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |giovanni@paclan.it
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #5 from Henrik Krohns <he...@hege.li> ---
(In reply to Ondřej Caletka from comment #4)
>
> Sorry, it seems that I uploaded the wrong file. The problem is still
> reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9
> stretch.
I have tried now on exact same fresh Debian 9, 3.4.2-1~deb9u1 installation, but
no luck. Only change is the added headers.
$ diff mwe.eml mwe2.eml
1a2,8
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level: **
> X-Spam-Status: No, score=2.0 required=5.0 tests=FOO_1,FOO_2 autolearn=no
> autolearn_force=no version=3.4.2
> X-Spam-Report:
> * 1.0 FOO_1 BODY: No description available.
> * 1.0 FOO_2 No description available.
Could you please list all settings you have changed from default?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Giovanni Bechis <gi...@paclan.it> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #9 from Giovanni Bechis <gi...@paclan.it> ---
That was it, fixed in r1861317.
Thanks
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Ondřej Caletka <on...@caletka.cz> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #5629|0 |1
is obsolete| |
--- Comment #3 from Ondřej Caletka <on...@caletka.cz> ---
Created attachment 5660
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5660&action=edit
mwe.eml after passing through spamassassin
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #4 from Ondřej Caletka <on...@caletka.cz> ---
(In reply to Henrik Krohns from comment #2)
> Can't reproduce here. Even your "mangled" example attachment body is
> identical to your first attachment!?
Sorry, it seems that I uploaded the wrong file. The problem is still
reproducible for me. I'm using spamassasin 3.4.2-1~deb9u1 on Debian 9 stretch.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #10 from Ondřej Caletka <on...@caletka.cz> ---
(In reply to Giovanni Bechis from comment #9)
> That was it, fixed in r1861317.
> Thanks
Thank you very much!
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
Henrik Krohns <he...@hege.li> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hege@hege.li
--- Comment #2 from Henrik Krohns <he...@hege.li> ---
Can't reproduce here. Even your "mangled" example attachment body is identical
to your first attachment!?
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #6 from Henrik Krohns <he...@hege.li> ---
Ah ok, I did manage to reproduce it. But I had to run network tests. Apparently
the extra stuff in this report makes it change the body encoding,
investigating..
$ diff mwe.eml mwe6.eml
1a2,18
> X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on xxx
> X-Spam-Level:
> X-Spam-Status: No, score=0.6 required=5.0 tests=FOO_1,FOO_2,RCVD_IN_DNSWL_MED,
> SPF_FAIL,SPF_HELO_NONE,TO_EQ_FM_DOM_SPF_FAIL autolearn=no
> autolearn_force=no version=3.4.2
> X-Spam-Report:
> * -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/,
> * medium trust
> * [2001:718:1:1:0:0:144:199 listed in]
> [list.dnswl.org]
> * 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record
> * 0.9 SPF_FAIL SPF: sender does not match SPF record (fail)
> * [SPF failed: Please see http://www.openspf.org/Why?s=mfrom;id=usera%40example.com;ip=2001%3A718%3A1%3A1%3A%3A144%3A199;r=super.palvel.in]
> * 1.0 FOO_1 BODY: No description available.
> * 1.0 FOO_2 No description available.
> * 0.0 TO_EQ_FM_DOM_SPF_FAIL To domain == From domain and external SPF
> * failed
16c33 kÃœ kůŠúpÄ Äábelské ódy.
---ÅÃÅ¡ernÄÅŸluÅ¥ouÄ
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7657] Certain messages get mangled by double UTF-8 encoding
Posted by bu...@spamassassin.apache.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7657
--- Comment #7 from Henrik Krohns <he...@hege.li> ---
Bug 7305 is the culprit, specifically this part in Revision 1831073
spamassassin.raw:
# OK, do checks and put out the message.
my $status = $spamtest->check($mail);
- print $status->rewrite_mail() or die "error writing: $!";
+ { my $report = $status->rewrite_mail();
+ # encode Unicode characters to UTF-8 octets
+ utf8::encode($report) if utf8::is_utf8($report);
+ print $report or die "error writing: $!";
+ }
Maybe Giovanni can chime in what this bit is intended to do. The original bug
only talked about forcing C locale, but these utf8 encodings are other thing?
--
You are receiving this mail because:
You are the assignee for the bug.