You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2015/10/14 15:59:07 UTC

[Bug 7253] New: X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

            Bug ID: 7253
           Summary: X-Spam-Report incorrectly mime-encodes multiline
                    report in header, violating RFC 2047
           Product: Spamassassin
           Version: 3.4.1
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: Mark.Martinec@ijs.si

With
  report_safe 0

and a rule with non-ASCII description, e.g.:

  header L_TEST_REPORT_ENCODING From =~ /./
  score  L_TEST_REPORT_ENCODING 0.01
  describe L_TEST_REPORT_ENCODING  En-tête contient caractères

the resulting X-Spam-Report multiline header field as inserted
into spam messages is incorrectly encoded into encoded-words:
the whole multiline header field is encoded into a single
encoded-words, whitespace is not encoded, the result contains
whitespace within encoded-word, and the encoded-word spans across
lines:

X-Spam-Report: =?UTF-8?Q?
  *  100 USER_IN_BLACKLIST From: address is in the user's black-list
  *  0.0 L_TEST_REPORT_ENCODING En-t=c3=aate contient caract=c3=a8res
  * -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5%
  *      [score: 0.0137]
  *  0.3 TXREP TXREP: Score normalizing based on sender's reputation?=

This is wrong on multiple accounts. The RFC 2047 is explicit:


   An 'encoded-word' may not be more than 75 characters long, including
   'charset', 'encoding', 'encoded-text', and delimiters.

[...]

   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence
      =?iso-8859-1?q?this is some text?=
   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.
      =?iso-8859-1?q?this=20is=20some=20text?=

[...]

   Only a subset of the printable ASCII characters may be used in
   'encoded-text'.  Space and tab characters are not allowed, so that
   the beginning and end of an 'encoded-word' are obvious.


The culprit is  MS::PerMsgStatus::qp_encode_header().
It should encode (when necessary) each line individually,
and should encode whitespace within encoded-word(s).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

--- Comment #5 from Mark Martinec <Ma...@ijs.si> ---
Created attachment 5332
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5332&action=edit
proposed patch

- encoded-words must not cross line boundaries
- encoded-words must not contain non-encoded whitespace
- choose between Q and B encoding based on shorter encoded length
  (and rename sub qp_encode_header to mime_encode_header)
- change a call to MIME::Base64::encode_base64 in oder to
  prevent it from inserting a newline in the middle of a
  longish encoded-word, add a test for this
- adjust t/prefs_include.t and t/reportheader_8bit.t to
  cope with this fix
- enhanced the test t/header_utf8.t

trunk:
  Sending lib/Mail/SpamAssassin/PerMsgStatus.pm
  Sending lib/Mail/SpamAssassin/Util.pm
  Sending t/header_utf8.t
  Sending t/prefs_include.t
  Sending t/reportheader_8bit.t
Committed revision 1708863.


Btw, the mime_encode_header() is still not perfect, as it is incapable
of breaking long encoded-words if line length exceeds 75 characters.
Still, it is a major improvement over what we had and over what
Encode::MIME::Header would offer.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|Undefined                   |4.0.0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

--- Comment #1 from Mark Martinec <Ma...@ijs.si> ---
> The culprit is  MS::PerMsgStatus::qp_encode_header().
> It should encode (when necessary) each line individually,
> and should encode whitespace within encoded-word(s).

Tried to use Encode (latest version: 2.72) in qp_encode_header()
to do the encoding:
  $text = Encode::encode('MIME-Header', $text);

Although the Encode::MIME::Header is much more careful than
our code, it is still wrong, resulting in an empty line
inserted in the header!!!

X-Spam-Report:
  *  100 USER_IN_BLACKLIST From: address is in the user's black-list


=?UTF-8?B?CSogIDAuMCBMX1RFU1RfUkVQT1JUX0VOQ09ESU5HIEVuLXTDg8KqdGUgY29udGllbg==?=
 =?UTF-8?B?dCBjYXJhY3TDg8KocmVz?=
  * -0.3 BAYES_05 BODY: Bayes spam probability is 1 to 5%
  *      [score: 0.0137]
  *  0.2 TXREP TXREP: Score normalizing based on sender's reputation


Bad, bad, bad, (and unsightly).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

--- Comment #3 from Mark Martinec <Ma...@ijs.si> ---
For reference:

> - Encode::MIME::Header inserts an empty line in a header on encoding
reported: https://rt.cpan.org/Ticket/Display.html?id=107775

> - produces a continuation line with no leading whitespace on encoding
reported: https://rt.cpan.org/Ticket/Display.html?id=107776

> - loses whitespace on folding when decoding
> https://rt.cpan.org/Public/Bug/Display.html?id=40027

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

--- Comment #2 from Mark Martinec <Ma...@ijs.si> ---
> Tried to use Encode (latest version: 2.72)
[...]
> Although the Encode::MIME::Header is much more careful than
> our code, it is still wrong, resulting in an empty line
> inserted in the header!!!

Tried Encode 2.78 (the latest on CPAN), it is still broken
on multiple accounts:


- still inserts an empty line in a header on encoding:

$ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-B",
  "report\n\tL_TEST_REPORT_ENCODING En-tête contient
caractères\n\tend-report"))'

<quote>
Subject: report

 =?UTF-8?B?CUxfVEVTVF9SRVBPUlRfRU5DT0RJTkcgRW4tdMOqdGUgY29udGllbnQgY2FyYWM=?=
 =?UTF-8?B?dMOocmVz?=
        end-report
</quote>


- produces a continuation line with no leading whitespace on encoding:

$ perl -e 'use Encode; use utf8; printf("Subject: %s\n", encode("MIME-Q",
  "report\n\tL_TEST_REPORT_ENCODING En-tête contient
caractères\n\tend-report"))'

<quote>
Subject: report
=?UTF-8?Q?=09L=5FTEST=5FREPORT=5FENCODI?=
 =?UTF-8?Q?NG=20En=2Dt=C3=AAte=20contient=20?= =?UTF-8?Q?caract=C3=A8res?=
        end-report
</quote>


- loses whitespace on folding when decodin
    https://rt.cpan.org/Public/Bug/Display.html?id=40027

$ perl -le 'use Encode; print decode("MIME-Header", "a: b\r\n c")'
a: bc

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

--- Comment #4 from Mark Martinec <Ma...@ijs.si> ---
It's funny (if not disheartening) that we have two tests which are there
specifically to test encoding of an X-Spam-Report in a header, and they
both expect an incorrectly encoded result. When the bug is fixed these
tests fail :)
  t/reportheader_8bit.t
  t/prefs_include.t

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 7253] X-Spam-Report incorrectly mime-encodes multiline report in header, violating RFC 2047

Posted by bu...@bugzilla.spamassassin.org.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7253

Mark Martinec <Ma...@ijs.si> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Mark Martinec <Ma...@ijs.si> ---
Should do for now, it's a significant improvement anyway.
Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.