You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/10/14 15:59:07 UTC
[Bug 7253] New: X-Spam-Report incorrectly mime-encodes multiline
report in header, violating RFC 2047
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
Bug ID: 7253
Summary: X-Spam-Report incorrectly mime-encodes multiline
report in header, violating RFC 2047
Product: Spamassassin
Version: 3.4.1
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P2
Component: Libraries
Assignee: dev@spamassassin.apache.org
Reporter: Mark.Martinec@ijs.si
With
report_safe 0
and a rule with non-ASCII description, e.g.:
header L_TEST_REPORT_ENCODING From =~ /./
score L_TEST_REPORT_ENCODING 0.01
describe L_TEST_REPORT_ENCODING En-tête contient caractères
the resulting X-Spam-Report multiline header field as inserted
into spam messages is incorrectly encoded into encoded-words:
the whole multiline header field is encoded into a single
encoded-words, whitespace is not encoded, the result contains
whitespace within encoded-word, and the encoded-word spans across
lines:
X-Spam-Report: =?UTF-8?Q?
* 100 USER_IN_BLACKLIST From: address is in the user's black-list
* 0.0 L_TEST_REPORT_ENCODING En-t=c3=aate contient caract=c3=a8res
* -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5%
* [score: 0.0137]
* 0.3 TXREP TXREP: Score normalizing based on sender's reputation?=
This is wrong on multiple accounts. The RFC 2047 is explicit:
An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters.
[...]
IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
by an RFC 822 parser. As a consequence, unencoded white space
characters (such as SPACE and HTAB) are FORBIDDEN within an
'encoded-word'. For example, the character sequence
=?iso-8859-1?q?this is some text?=
would be parsed as four 'atom's, rather than as a single 'atom' (by
an RFC 822 parser) or 'encoded-word' (by a parser which understands
'encoded-words'). The correct way to encode the string "this is some
text" is to encode the SPACE characters as well, e.g.
=?iso-8859-1?q?this=20is=20some=20text?=
[...]
Only a subset of the printable ASCII characters may be used in
'encoded-text'. Space and tab characters are not allowed, so that
the beginning and end of an 'encoded-word' are obvious.
The culprit is MS::PerMsgStatus::qp_encode_header().
It should encode (when necessary) each line individually,
and should encode whitespace within encoded-word(s).
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
--- Comment #5 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5332
--> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5332&action=edit
proposed patch
- encoded-words must not cross line boundaries
- encoded-words must not contain non-encoded whitespace
- choose between Q and B encoding based on shorter encoded length
(and rename sub qp_encode_header to mime_encode_header)
- change a call to MIME::Base64::encode_base64 in oder to
prevent it from inserting a newline in the middle of a
longish encoded-word, add a test for this
- adjust t/prefs_include.t and t/reportheader_8bit.t to
cope with this fix
- enhanced the test t/header_utf8.t
trunk:
Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
Sending lib/Mail/SpamAssassin/Util.pm
Sending t/header_utf8.t
Sending t/prefs_include.t
Sending t/reportheader_8bit.t
Committed revision 1708863.
Btw, the mime_encode_header() is still not perfect, as it is incapable
of breaking long encoded-words if line length exceeds 75 characters.
Still, it is a major improvement over what we had and over what
Encode::MIME::Header would offer.
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
Mark Martinec <Ma...@ijs.si> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|Undefined |4.0.0
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
> The culprit is MS::PerMsgStatus::qp_encode_header().
> It should encode (when necessary) each line individually,
> and should encode whitespace within encoded-word(s).
Tried to use Encode (latest version: 2.72) in qp_encode_header()
to do the encoding:
$text = Encode::encode('MIME-Header', $text);
Although the Encode::MIME::Header is much more careful than
our code, it is still wrong, resulting in an empty line
inserted in the header!!!
X-Spam-Report:
* 100 USER_IN_BLACKLIST From: address is in the user's black-list
=?UTF-8?B?CSogIDAuMCBMX1RFU1RfUkVQT1JUX0VOQ09ESU5HIEVuLXTDg8KqdGUgY29udGllbg==?=
=?UTF-8?B?dCBjYXJhY3TDg8KocmVz?=
* -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5%
* [score: 0.0137]
* 0.2 TXREP TXREP: Score normalizing based on sender's reputation
Bad, bad, bad, (and unsightly).
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
For reference:
> - Encode::MIME::Header inserts an empty line in a header on encoding
reported: https://rt.cpan.org/Ticket/Display.html?id=107775
> - produces a continuation line with no leading whitespace on encoding
reported: https://rt.cpan.org/Ticket/Display.html?id=107776
> - loses whitespace on folding when decoding
> https://rt.cpan.org/Public/Bug/Display.html?id=40027
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
> Tried to use Encode (latest version: 2.72)
[...]
> Although the Encode::MIME::Header is much more careful than
> our code, it is still wrong, resulting in an empty line
> inserted in the header!!!
Tried Encode 2.78 (the latest on CPAN), it is still broken
on multiple accounts:
- still inserts an empty line in a header on encoding:
$ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-B",
"report\n\tL_TEST_REPORT_ENCODING En-tête contient
caractères\n\tend-report"))'
<quote>
Subject: report
=?UTF-8?B?CUxfVEVTVF9SRVBPUlRfRU5DT0RJTkcgRW4tdMOqdGUgY29udGllbnQgY2FyYWM=?=
=?UTF-8?B?dMOocmVz?=
end-report
</quote>
- produces a continuation line with no leading whitespace on encoding:
$ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-Q",
"report\n\tL_TEST_REPORT_ENCODING En-tête contient
caractères\n\tend-report"))'
<quote>
Subject: report
=?UTF-8?Q?=09L=5FTEST=5FREPORT=5FENCODI?=
=?UTF-8?Q?NG=20En=2Dt=C3=AAte=20contient=20?= =?UTF-8?Q?caract=C3=A8res?=
end-report
</quote>
- loses whitespace on folding when decodin
https://rt.cpan.org/Public/Bug/Display.html?id=40027
$ perl -le 'use Encode; print decode("MIME-Header", "a: b\r\n c")'
a: bc
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
It's funny (if not disheartening) that we have two tests which are there
specifically to test encoding of an X-Spam-Report in a header, and they
both expect an incorrectly encoded result. When the bug is fixed these
tests fail :)
t/reportheader_8bit.t
t/prefs_include.t
--
You are receiving this mail because:
You are the assignee for the bug.
[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report
in header, violating RFC 2047
Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253
Mark Martinec <Ma...@ijs.si> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #6 from Mark Martinec <Ma...@ijs.si> ---
Should do for now, it's a significant improvement anyway.
Closing.
--
You are receiving this mail because:
You are the assignee for the bug.